Pandas.read_csv ParserError '§' expected after '"' with sep = "§"

97 views Asked by At

I have an issue with read_csv and its taking a lot of time to resolve.

I am working with texts which have multiple special characters, so I was checking which character isn't in the list of texts and chose § as delimiter while writing the csv files that separates the texts with corresponding IDs.

However, while reading the files, I am getting the following error. I could skip the bad lines, but in this case I cannot afford to lose any texts.

ParserError: '§' expected after '"'

Writing

df.to_csv('20231010.csv',
           index=False,
           sep='§',
           #header=None,
           quoting=csv.QUOTE_NONE,
           quotechar="",
           escapechar=" ")

Reading

data = pd.read_csv('20231010.csv',  sep ="§", encoding='utf-8')
1

There are 1 answers

0
mozway On BEST ANSWER

It doesn't make sense to disable quoting, and actually you don't even need to use a fancy character, just use the default settings:

df = pd.DataFrame({'text1': ['abc"123§', 'def ,456'],
                   'text2': ['ghi`789', 'jkl|123'],
                  })

df.to_csv('20231010.csv', index=False)

CSV:

text1,text2
"abc""123§",ghi`789
"def ,456",jkl|123

Importing again:

df2 = pd.read_csv('20231010.csv')
print(df2)

Output:

      text1    text2
0  abc"123§  ghi`789
1  def ,456  jkl|123

Pandas can relatively well import/export a CSV file without changes. The most common things that could cause a change are:

  • the default inclusion of the index in to_csv, which gets converted to column by read_csv
  • conversion of specific strings to NaN (e.g. NULL/NA), which can be annoying if those strings have a different meaning in your context

You can avoid theses issues by using index=False in to_csv (as you did), and keep_default_na=False in read_csv.