parse csv file into dataframe using IO/StringIO in pandas python

37 views Asked by At

looking for a faster way to read csv file and manipulate to achieve the desired output. note that csv file has data of array type, those to be rejected; also has hex type, those to be converted to decimal.

"Index","Topic",    "Instance",     "Timestamp",                            "Variables Values"
"0",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644901 EST",    "Tp1_var1=tp1_var1enum1 (1), tp1_var2=tp1_var2enum1 (10), tp1_var3=true, "
"1",    "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644902 EST",    "Tp2_var1=10.5, tp2_var2=[1,2,3,4,5,6], tp2_var3=0x0A"
"2",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644903 EST",    "Tp1_var1=tp1_var1enum2 (2), tp1_var2=tp1_var2enum2 (20), tp1_var3=false, "
"3",    "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644904 EST",    "Tp2_var1=12.5, tp2_var2=[1,2,3,8,5,6], tp2_var3=0x0B"
"4",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644905 EST",    "Tp1_var1=tp1_var1enum3 (3), tp1_var2=tp1_var2enum3 (30), tp1_var3=false, "
"5",    "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644906 EST",    "Tp2_var1=22.1, tp2_var2=[1,1,3,4,5,6], tp2_var3=0x0C"
"6",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644907 EST",    "Tp1_var1=tp1_var1enum4 (4), tp1_var2=tp1_var2enum4 (40), tp1_var3=true, "
"7",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644908 EST",    "Tp1_var1=tp1_var1enum5 (5), tp1_var2=tp1_var2enum5 (50), tp1_var3=false, "
"8",    "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644909 EST",    "Tp2_var1=76.3, tp2_var2=[1,2,3,4,5,9], tp2_var3=0x0D"
"9",    "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644910 EST",    "Tp1_var1=tp1_var1enum6 (6), tp1_var2=tp1_var2enum6 (60), tp1_var3=false, "
"10",   "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644911 EST",    "Tp2_var1=50.0, tp2_var2=[1,2,3,4,3,6], tp2_var3=0x0E"
"11",   "Topic2",   "<some info>",  "2024-01-20 04:48:55.460644912 EST",    "Tp2_var1=17.3, tp2_var2=[1,2,7,7,5,6], tp2_var3=0x0F"
"12",   "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644913 EST",    "Tp1_var1=tp1_var1enum7 (7), tp1_var2=tp1_var2enum7 (70), tp1_var3=false, "
"13",   "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644914 EST",    "Tp1_var1=tp1_var1enum8 (8), tp1_var2=tp1_var2enum8 (80), tp1_var3=true, "
"14",   "Topic1",   "<some info>",  "2024-01-20 04:48:55.460644915 EST",    "Tp1_var1=tp1_var1enum9 (9), tp1_var2=tp1_var2enum9 (90), tp1_var3=false, "

desired output

Timestamp                           tp1_var1    tp1_var2    tp1_var3    tp2_var1    tp2_var3
2024-01-20 04:48:55.460644901 EST   1           10          1           nan         nan
2024-01-20 04:48:55.460644902 EST   nan         nan         nan         10.5        10
2024-01-20 04:48:55.460644903 EST   2           20          0           nan         nan
2024-01-20 04:48:55.460644904 EST   nan         nan         nan         12.5        11
2024-01-20 04:48:55.460644905 EST   3           30          0           nan         nan
2024-01-20 04:48:55.460644906 EST   nan         nan         nan         22.1        12
2024-01-20 04:48:55.460644907 EST   4           40          1           nan         nan
2024-01-20 04:48:55.460644908 EST   5           50          0           nan         nan
2024-01-20 04:48:55.460644909 EST   nan         nan         nan         76.3        13
2024-01-20 04:48:55.460644910 EST   6           60          0           nan         nan
2024-01-20 04:48:55.460644911 EST   nan         nan         nan         50          14
2024-01-20 04:48:55.460644912 EST   nan         nan         nan         17.3        15
2024-01-20 04:48:55.460644913 EST   7           70          0           nan         nan
2024-01-20 04:48:55.460644914 EST   8           80          1           nan         nan
2024-01-20 04:48:55.460644915 EST   9           90          0           **nan**         nan

manipulated data and reshaped in panda to achieve desired format/shape. looking for a faster way.

0

There are 0 answers