I am not even sure if I am framing this question right, but I will try - I have a bunch of huge text files generated from oracle export on a Linux system. Each file is about 30 GB in size and I have about 50 of them. The goal is to export this data to Azure SQL Datawarehouse. BCP in this case is not the right approach keeping the size of the data, so I had to use Polybase. After converting from ASCII to UTF8 encoding, I ran into an issue while querying the external tables. Polybase doesn't play well with the fixed width text file with a line break in each line.
The text file looks something like this
101,102,103,104,105,106,107
108,108,109,110,111,112,113
114,115,116,117,118,119,120
121,122,123
--nothing here, just a blank line
201,202,203,204,205,206,207
208,209,210,211,212,213,214
215,216,217
Polybase tries to process from 101 through 107 and errors out complaining there were not enough columns in this file to process.
Here is what I think is happening. The fixed width and line breaks are making it treat line break as a row delimiter.
How do I convert this file to look like below:
101,102,103,104,105,106,107,108,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123{CR}{LF}
201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217{CR}{LF}