Unwarping Records In Notepad++/UltraEdit

360 views Asked by At

I work with big data, especially text files. These records have fixed number of columns (19 col) . If a record warps it becomes incomplete and the spilled fields form an incomplete new record as shown below.

Records warped

I need to un-warp the records so that they fit in their respective columns delimited by semicolon. The files can have up to 250,000 records. a faster way to check/fix this will be deeply appreciated.

Thank you.

2

There are 2 answers

0
psxls On

Normally in Notepad++ you would go to menu View and deselect the Word wrap option.

But from your screenshot, it seems more like to be a problem with newlines inserted in your data. For example between line 10 and 11, it looks like as if a newline got inserted at the 18th column. Go to View > Show Symbol > Show All Characters to validate this hypothesis. That would explain also why you have the same problem with both editors.

0
Mofi On

The UltraEdit forum contains the topics

Find lines in CSV file with less or more than X tabs within a line

How to find line breaks in fields of a CSV file and remove it?

which discuss this common problem in CSV files.

With the Perl regular expression search string ^((?!(?:[^;\r\n]*;){18}).*)\r*\n and using \1 as replace string you can eliminate line breaks on lines with less than 18 semicolons (19 data columns).

But run this replace manually and not with a Replace All. All data rows which have multiple line breaks would be formatted wrong with using a Replace All.

Example:

a1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;1
8;19
b1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
c1;2;3;4;5;6;7;8;9;1
0;11;12;13;14;15;16;17;1
8;19
d1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19

should become

a1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
b1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
c1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
d1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19

and not

a1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
b1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19
c1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;1
8;19d1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19

So be careful with the replace and set the caret in the file back to the line above during the replace executed step by step if a data row has multiple line breaks.

With an UltraEdit macro it is possible to fix all line breaks within the data rows automatically.

The macro code is:

InsertMode
ColumnModeOff
HexOff
Top
PerlReOn
Loop 0
Find MatchCase RegExp "^((?!(?:[^;\r\n]*;){18}).*)\r*\n"
Replace "\1"
IfFound
Key HOME
Else
ExitLoop
EndIf
EndLoop
Top