I'm trying to load a csv file to my Jupiter Notebook, I managed to load the file but some columns of the data holds text in hebrew and it loads it as gibberish
The code I used is the following:
import pandas as pd
cars = pd.read_csv (r'C:\Users\MyName\Folder\number_of_cars.csv',encoding='cp862',sep='|')
I tried a few diffrent encodings that work with Hebrew like cp424 / cp856 / cp1255 / iso8859_8 but got error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x73 **in position 2**: character maps to <undefined>
The only encoding that worked was cp862 and latin-1 (not sure if latin-1 even works with hebrew) but both return gibberish instead of Hebrew text.
Edit: also tried utf-8 and got this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 309: invalid continuation byte
I'm not a proggremer, my experience with python is only about analyzing data.
you can view the data set here : https://data.gov.il/dataset/private-and-commercial-vehicles/resource/053cea08-09bc-40ec-8f7a-156f0677aff3
The particular file (download version) is encoded in
ANSI 1255, so yourcp1255should work. But! The file has 3 errors that prevents correct parsing in that codepage. Example: at06161062, after1KD. Byte0x9F.You can handle conversion errors in Python.
Useful information is available in read_csv documentation.
See a list of ways to handle encoding errors
The following slight change works here. Choose the error handler you see best.