I am facing an issue using the read_fwf command from the Python library pandas, same as described in this unresolved question
I want to read an ascii file containing results of numeric computations. The file contains repetitive header blocks with a variable number of lines between each of them and with a roughly described format but rather arbitrary string content. I do not want to allow any loss of information and therefore need to be able to process everything as it was originally.
Look at this examplary line:
$SUBTITLE= 48-ELEMENT CANTILEVER BEAM ON THE X-AXIS 1181
Using a
with pd.read_fwf(self.file_path, header=None,
chunksize=chunk_size, # no of lines read
colspecs=[(0, 23), (23, 41), (41, 59), (59, 72)],
engine='c', dtype={0: str}) as df_gen:
context manager to read the file, white spaces between columns will be stripped despite the seamlessly neighbouring column lengths. So it does not act as a fixed-width reader.
Cutting out the middle part of the line above with
"".join(s.dropna()).split("=")[-1].strip(" ")
with s being the row of the DataFrame containing that line, I will get a string
48-ELEMENT CANTILEVER BEAM ONTHE X-AXIS
instead of
48-ELEMENT CANTILEVER BEAM ON THE X-AXIS
.
Does anybody know an elegant solution to this issue? I am working on an approach where I will initially not cut into columns and do it for the numeric blocks after i extracted the header lines but it might increase computation time significantly.
Is there maybe a keyword in newer pandas versions that will disable the stripping?
Take bigger cuts:
Output:
Or you can just read in line by line:
A little regex magic and you can consistently extract the inner line: