Parse Problematic Fixed width text file to a pandas dataframe

443 views Asked by At

I need to parse a FWF to a df, but the system that exports the file instead of & symbol exports &. This break the fixed-width as the lines with & have now more characters so I cant use read_fwf. I can import it with the below code, but this gives me a single column that I must later split.

import pandas as pd
cnv = lambda txt: txt.replace('&', '&')
df = pd.read_csv('AccountStatement1.txt', skiprows=4, skipfooter=3, engine='python', sep='$%^#~&', converters= { i: cnv for i in range(1) })
df

i use sep='$%^#~&' so i can have only one column and correct the text using the converter. What is the proper solution to this?

Sample of the text file:

=======================================================================================================================================================================================================================================================================================================================================================
From: Transaction Date: 21/05/2021 To: Transaction Date: 23/06/2021 IBAN:: CYxxxxxx0000000000000 Currency: EUR Previous Statement Balance:       1,111.10 BIC: xxxxxxxx
=======================================================================================================================================================================================================================================================================================================================================================

Transaction Date Transaction details                  Reference No.         Description 1                        Description 2                        Value date                    Debit               Credit              Balance  
27/05/2021       CHQ: 12568987                                                                                                                        26/05/2021                   645.00                                  9,708.70  
27/05/2021       DEBIT                                EB2021057554434221149 xxxx xxxxxxxxx xxxxxx                                                     27/05/2021                 0,888.36                                  3,723.74  
28/05/2021       I2456787437452                                             B/O: xxxxxxxxxxxxxxx LTD             TRANSFER xxxxxxxxx xxxxxxxxxx        27/05/2021                                     19,002.00            13,755.74  
28/05/2021       INWARD TRANSFER COMMISSION                                 CY21jhjh884786                       I2107675689452                       28/05/2021                    10.00                                 15,723.74  
31/05/2021       ATM/POS DEBIT                                              jhgjhkjhjk jkh f4         1211       xxxxxx xxxxx & xx xxxxx              27/05/2021                    60.00                                 52,680.74  
31/05/2021       Service Charges                                            MONTHLY MAINTENANCE FEE                                                   31/05/2021                    35.00                                 73,645.74  
01/06/2021       Service Charges                                            STATEMENT FEE - MAY 2021                                                  31/05/2021                     5.00                                 19,645.74  
02/06/2021       ATM/POS DEBIT                                              POS 567521 124454         1211       xxxxxxxxxxxx & Exxxxxxx              31/05/2021                   170.00                                 09,320.74  
03/06/2021       CHQ: 13456784                                                                                                                        02/06/2021                    80.00                                 10,230.74  
04/06/2021       ATM/POS DEBIT                                              POS 345671 124258         1278       xxxxxxxxxxxx & xxxxxxxx              02/06/2021                   940.00                                 23,960.74  
08/06/2021       ATM/POS DEBIT                                              POS 345671 125678         1278       xxxxxxx xxxxx xxxxx                  04/06/2021                    13.20                                 13,347.54  
15/06/2021       ATM/POS DEBIT                                              POS 145671 156612         1671       xxxx xxxxxxxxxxxxxx680               11/06/2021                    25.53                                 13,322.01  
=======================================================================================================================================================================================================================================================================================================================================================
Number of records:     22          IBAN:: xxxx234567898765434567876545     Currency: EUR     Current Statement Balance:       0,000.00
=======================================================================================================================================================================================================================================================================================================================================================

1

There are 1 answers

0
Cimbali On BEST ANSWER

Maybe you could load the file, replace the problematic characters, then read it as fixed width with pd.read_fwf using io.StringIO to make an in-memory buffer:

>>> import io, pandas as pd
>>> with open('test.csv') as f:
...   lines = f.readlines()
>>> pd.read_fwf(io.StringIO(''.join(lines[4:-3]).replace('&', '&')))
    a    b  c
0  11  fo&  0

This is the file’s content, unaligned by & as you indicate:

>>> print(''.join(lines))
foo
bar
baz
qux
 a   b  c
11 fo&  0
ig
nore
me