How to see and change linefeeds in CSV file in unix

477 views Asked by At

I'm trying to load some data into netezza but it won't let me as my file has linefeeds at the end of each line

my questions are: I can see the linefeeds if I look in notepad++, how do i do the equivalent in unix?

How can I change the linefeeds to add an escape character to the beggining of each linefeed so I can load into netezza?

1

There are 1 answers

0
ScottMcG On

Linefeeds are the expected character at the end of the line for nzload. I think it's carriage returns you must be having problems with.

To your first question, you can use "vi -b " to prevent vi from detecting that the file came from DOS and trying to hide the carriage returns that are causing you heartache. If you see "[dos]" at the bottom of the vi screen next to the filename, then you know there are carriage return characters there that you aren't seeing.

If your data file is to big to open in vi you can use "od -a filename" to inspect the data for carriage returns or other control characters.

If you want to retain the carriage returns in your data, whether at the end of the lines or not you can escape them with a regex like so:

[nz@netezza ~]$ cat data.txt
ABC
DEF
[nz@netezza ~]$ od -a data.txt
0000000   A   B   C  cr  nl   D   E   F  cr  nl
0000012
[nz@netezza ~]$ perl -p -e 's/\r/\\\r/g' data.txt > data_escaped.txt
[nz@netezza ~]$ cat data_escaped.txt
ABC\
DEF\
[nz@netezza ~]$ od -a data_escaped.txt
0000000   A   B   C   \  cr  nl   D   E   F   \  cr  nl
0000014

Then you can use nzload with the -escapechar option to load the data. However, if you use the -escapechar option, you have to account for the possibility of having that escapechar () in your data. A more thorough prep would be like so:

[nz@netezza ~]$ cat data.txt
A\BC\
DE\F
[nz@netezza ~]$ od -a data.txt
0000000   A   \   B   C   \  cr  nl   D   E   \   F  cr  nl
0000015
[nz@netezza ~]$ perl -p -e 's/\\/\\\\/g' data.txt | perl -p -e 's/\r/\\\r/g' > data_escaped.txt
[nz@netezza ~]$ cat data_escaped.txt
A\\BC\\\
DE\\F\
[nz@netezza ~]$ od -a data_escaped.txt
0000000   A   \   \   B   C   \   \   \  cr  nl   D   E   \   \   F   \
0000020  cr  nl
0000022

You can then use nzload and verify the results.

[nz@netezza ~]$ nzload -db testdb -t crtest -df data_escaped.txt -escapechar \\ -ctrlchars -crinstring
Load session of table 'CRTEST' completed successfully
[nz@netezza ~]$ nzsql -d testdb -c "select * from crtest"
  COL1
--------
 A\BC\
 DE\F
(2 rows)

[nz@netezza ~]$ nzsql -t -d testdb -c "select * from crtest"  | od -a
0000000  sp   A   \   B   C   \  cr  nl  sp   D   E   \   F  cr  nl  nl
0000020

Or, you can could just simply eradicate the carriage returns with dos2unix if they aren't actually part of the data you want, and are simply an unwanted artifact of the process that created your file, and be done with it.

[nz@netezza ~]$ cat data.txt
ABC
DEF
[nz@netezza ~]$ od -a data.txt
0000000   A   B   C  cr  nl   D   E   F  cr  nl
0000012
[nz@netezza ~]$ dos2unix data.txt
dos2unix: converting file data.txt to UNIX format ...
[nz@netezza ~]$ od -a data.txt
0000000   A   B   C  nl   D   E   F  nl
0000010