Trying to load csv file with Hebrew text but gets gibberish

Question

Trying to load csv file with Hebrew text but gets gibberish

308 views Asked by Liad Traube At 10 August 2023 at 13:52

I'm trying to load a csv file to my Jupiter Notebook, I managed to load the file but some columns of the data holds text in hebrew and it loads it as gibberish

The code I used is the following:

import pandas as pd

cars = pd.read_csv (r'C:\Users\MyName\Folder\number_of_cars.csv',encoding='cp862',sep='|')

I tried a few diffrent encodings that work with Hebrew like cp424 / cp856 / cp1255 / iso8859_8 but got error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x73 **in position 2**: character maps to <undefined>

The only encoding that worked was cp862 and latin-1 (not sure if latin-1 even works with hebrew) but both return gibberish instead of Hebrew text.

Edit: also tried utf-8 and got this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 309: invalid continuation byte

I'm not a proggremer, my experience with python is only about analyzing data.

you can view the data set here : https://data.gov.il/dataset/private-and-commercial-vehicles/resource/053cea08-09bc-40ec-8f7a-156f0677aff3

Original Q&A

There are 1 answers

**MyICQ** · Answer 1 · 2023-08-10T20:27:55+00:00

The particular file (download version) is encoded in ANSI 1255, so your cp1255 should work. But! The file has 3 errors that prevents correct parsing in that codepage. Example: at 06161062, after 1KD. Byte 0x9F.

You can handle conversion errors in Python.

Useful information is available in read_csv documentation.

See a list of ways to handle encoding errors

The following slight change works here. Choose the error handler you see best.

import pandas as pd

cars = pd.read_csv (r'hebrew__cars__download.csv',
           encoding='cp1255',
           sep='|',
           encoding_errors='backslashreplace')

# --- read some info
print("Shape: ", cars.shape)

#  Output for download:
#     Shape:  (3781124, 23)

TechQA.

Trying to load csv file with Hebrew text but gets gibberish

There are 1 answers

Related Questions in PANDAS

Related Questions in CSV

Related Questions in ENCODING

Related Questions in DECODE

Related Questions in HEBREW

Popular Questions

Trending Questions