How do I standardize correct encoding from differnets sources with pd.read_csv and pd.to_csv?

34 views Asked by At

"I read files from different sources using pandas. Currently, I'm facing an issue with a Portuguese word 'Assembléia.' When I read this word with the 'utf-8' encoding and keep it in the DataFrame, everything works well. However, when I export it to a CSV, the word changes to 'assembléia.' What should I do? I tried changing the encoding to 'latin1,' and it worked fine. But now, when I try to encode another file with 'latin1' as well, the code throws a UnicodeEncodeError."

this is a example with latin1


data = {'Palavra': ['assembléia']}

df = pd.DataFrame(data)

nome_arquivo_csv = r'C:\Users\user\OneDrive\Documents\cv - general\palavra_assembleia.csv'
df.to_csv(nome_arquivo_csv, index=False)

In this example, the CSV file displays the word 'assembléia' instead of the expected 'Assembléia.' The problem is likely

Is there a way to standardize all encoding for files?

1

There are 1 answers

2
gtomer On

Try:

df.to_csv(nome_arquivo_csv, index=False, encoding='utf-8')