I have a dataframe and I want to encode each word in my column by using soundex
, so I have to use split because Soundex
take only the first word
then I apply this line of code but I got this error:
table['soundex'] = table['name'].apply(lambda x:' '.join([jellyfish.soundex(i) for i in x.split()]))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte
and when I tried to apply it in other columns it works and they all same data type
my data source is a database and I have create a name column through cleansing steps I mean it is not original from the data source.
most of the solutions with UnicodeDecodeError coming with read CSV files and in my case I do not know what causes this error
random sample of data and expected output:
name soundex
hospital food H213 F300
good after noon G300 A136 N500
hi h000
any help?
I have solved it by remove non-English character using this line of code:
reference:
https://stackoverflow.com/a/56744855/10718214