I need to parse a FWF to a df, but the system that exports the file instead of &
symbol exports &
.
This break the fixed-width as the lines with &
have now more characters so I cant use read_fwf. I can import it with the below code, but this gives me a single column that I must later split.
import pandas as pd
cnv = lambda txt: txt.replace('&', '&')
df = pd.read_csv('AccountStatement1.txt', skiprows=4, skipfooter=3, engine='python', sep='$%^#~&', converters= { i: cnv for i in range(1) })
df
i use sep='$%^#~&'
so i can have only one column and correct the text using the converter.
What is the proper solution to this?
Sample of the text file:
=======================================================================================================================================================================================================================================================================================================================================================
From: Transaction Date: 21/05/2021 To: Transaction Date: 23/06/2021 IBAN:: CYxxxxxx0000000000000 Currency: EUR Previous Statement Balance: 1,111.10 BIC: xxxxxxxx
=======================================================================================================================================================================================================================================================================================================================================================
Transaction Date Transaction details Reference No. Description 1 Description 2 Value date Debit Credit Balance
27/05/2021 CHQ: 12568987 26/05/2021 645.00 9,708.70
27/05/2021 DEBIT EB2021057554434221149 xxxx xxxxxxxxx xxxxxx 27/05/2021 0,888.36 3,723.74
28/05/2021 I2456787437452 B/O: xxxxxxxxxxxxxxx LTD TRANSFER xxxxxxxxx xxxxxxxxxx 27/05/2021 19,002.00 13,755.74
28/05/2021 INWARD TRANSFER COMMISSION CY21jhjh884786 I2107675689452 28/05/2021 10.00 15,723.74
31/05/2021 ATM/POS DEBIT jhgjhkjhjk jkh f4 1211 xxxxxx xxxxx & xx xxxxx 27/05/2021 60.00 52,680.74
31/05/2021 Service Charges MONTHLY MAINTENANCE FEE 31/05/2021 35.00 73,645.74
01/06/2021 Service Charges STATEMENT FEE - MAY 2021 31/05/2021 5.00 19,645.74
02/06/2021 ATM/POS DEBIT POS 567521 124454 1211 xxxxxxxxxxxx & Exxxxxxx 31/05/2021 170.00 09,320.74
03/06/2021 CHQ: 13456784 02/06/2021 80.00 10,230.74
04/06/2021 ATM/POS DEBIT POS 345671 124258 1278 xxxxxxxxxxxx & xxxxxxxx 02/06/2021 940.00 23,960.74
08/06/2021 ATM/POS DEBIT POS 345671 125678 1278 xxxxxxx xxxxx xxxxx 04/06/2021 13.20 13,347.54
15/06/2021 ATM/POS DEBIT POS 145671 156612 1671 xxxx xxxxxxxxxxxxxx680 11/06/2021 25.53 13,322.01
=======================================================================================================================================================================================================================================================================================================================================================
Number of records: 22 IBAN:: xxxx234567898765434567876545 Currency: EUR Current Statement Balance: 0,000.00
=======================================================================================================================================================================================================================================================================================================================================================
Maybe you could load the file, replace the problematic characters, then read it as fixed width with
pd.read_fwf
usingio.StringIO
to make an in-memory buffer:This is the file’s content, unaligned by
&
as you indicate: