I'm trying to read a table from a text file (or StringIO) into pandas. To accomplish this, I use pandas.read_fwf.
However, I'm facing problems with the automatic column width detection. In my case it works properly for columns 1-3 but not for column 4, which contains informal text of undefined width.
The detection works good for the first three columns, because their width can properly determined from the headers. The 4th column start can also be determined properly, as it is aligned with the corresponding header.
However, pandas refuses to put all remaining text into the 4th column.
It either creates several Unnamed: X columns with each word of the informal text in one column or it creates one named column which contains only the first word of the informal text.
Here is the column format:
CL NAME STATE INFO
some category some_name some_state some informal info text
...
I'd like to achieve that all categories are put in column 1, all names in column two, all states in column three and all infos in column 4.
The two options I tried were:
-
x1 = pandas.read_fwf(infile, infer_nrows=1)-> Results in a
INFOcolumn containing only the first word of the info text.CL NAME ... Unnamed: 5 Unnamed: 6 0 some category some_name ... NaN NaN -
x2 = pandas.read_fwf(infile)-> Results in several unnamed columns each containing one word of the info text.
CL NAME STATE INFO 0 some some_name some_state some