I recently received some data items 99 pipe delimited txt files, however in some of them and ill use dataaddress.txt as an example, where there is a return in the address eg
14 MakeUp Road
Hull
HU99 9HU
It goming out on 3 rows rather than one, bear in made there is data before and after this address separated by pipes. It just seems to be this addresss issue which is causing me issues in oading the txt file correcting using SSIS.
Rather than go back to source I wondered if there was a way we can manipulate the txt file to remove these carriage returns while not affected the row end returns if that makes sense.
I would use
sed
orawk
. I will show you how to do this withawk
, because it more platform independent. If you do not haveawk
, you can download a mawk binary from http://invisible-island.net/mawk/mawk.html.The idea is as follows - tell
awk
that your line separator is something different, not carriage return or line feed. I will use comma.Than use a regular expression to replace the string that you do not like.
Here is a test file I created. Save it as
test.txt
:And call
awk
as follows:I suggest that you save the awk code into a file named
cleanup.awk
. Here is the better formatted code with explanations.Using the awk file, you can execute the replacement as follows:
To process multiple files, you can create a bash script: