Python Pandas - csv file is not correctly split into column as a result of quotechars?

42 views Asked by At

I've been struggling with the following problem for a while and can't seem to find the right solution anywhere.

The issue is as follows: I have a csv file that contains 13 columns, when I read the CSV file into Python Pandas everything is placed within a single column and I can't seem to properly split the dataset over the 13 columns.

I have tried many things but this is the most recent code I have used and the output resulting from it:

import pandas as pd

file_path = "C:\\Users\\User23\\Downloads\\Detail_Table_Test(2).csv"
    
df = pd.read_csv(file_path, sep=',', header = 0, engine = 'python', quotechar = '"')

print(df.head())
SESSION ID,"ACTIVITY DATE","ACTIVITY TIME","SEQUENCE KEY","CATEGORY NAME","DETAIL 1","DETAIL 2","DETAIL 3","DETAILS","TWO CATEGORIES BEFORE","FROM CATEGORY NAME","TO CATEGORY NAME","TWO CATEGORIES AFTER"
0  1111111,"2023-11-11","02:05:06","1","Landing s...                                                                                                                                                         
1  1111111,"2023-11-11","02:05:10","2","Data gath...                                                                                                                                                         
2  1111111,"2023-11-11","02:05:19","3","Questionn...                                                                                                                                                         
3  1111111,"2023-11-11","02:05:29","4","Questionn...                                                                                                                                                         
4  1111111,"2023-11-11","04:35:34","5","Data gath...  

Here is a line of data from the file:

111111,"2023-11-12","18:09:34","14","Questionnaire fill-in","Company ESG Assessment (new)","Diversity, Equity and Inclusion","Assessment","Company ESG Assessment (new)- Diversity, Equity and Inclusion: Assessment","Data gathering","Questionnaire fill-in","Questionnaire fill-in","Questionnaire fill-in"

I have tried specifying the quotechars = '"'

I have tried the following line of code: sep=', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))'

Thanks a 1000 times in advance :)

2

There are 2 answers

1
Pawan Tolani On

try this:

check=pd.read_csv(r"C:\Users\743622\Desktop\trial.txt",quoting=csv.QUOTE_NONE,quotechar = '"',sep='","',escapechar='\\',engine='python')
check[['SESSION ID','ACTIVITY DATE']]=check.iloc[:,0].str.split(',',expand=True)
check.drop(columns=['SESSION ID,"ACTIVITY DATE'],inplace=True)


    
0
Derby9421 On

Might be caused by incorrect delimiter / special characters exist in the file(s), or even diff encoding. Try below,

import csv
import pandas as pd

file_path = r"C:\\Users\\User23\\Downloads\\Detail_Table_Test(2).csv"

df = pd.read_csv(file_path, quoting=csv.QUOTE_MINIMAL, quotechar='"', escapechar='\\', engine='python')

print(df.head())