UnicodeDecodeError: with apply function in column for each row

231 views Asked by At

I have a dataframe and I want to encode each word in my column by using soundex, so I have to use split because Soundex take only the first word

then I apply this line of code but I got this error:

table['soundex'] = table['name'].apply(lambda x:' '.join([jellyfish.soundex(i) for i in x.split()]))

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte

and when I tried to apply it in other columns it works and they all same data type

my data source is a database and I have create a name column through cleansing steps I mean it is not original from the data source.

most of the solutions with UnicodeDecodeError coming with read CSV files and in my case I do not know what causes this error

random sample of data and expected output:

name                       soundex
hospital food              H213 F300
good after noon            G300 A136 N500
hi                         h000

any help?

1

There are 1 answers

0
Fatima On

I have solved it by remove non-English character using this line of code:

table.name=table.name.str.encode('ascii', 'ignore').str.decode('ascii')

reference:

https://stackoverflow.com/a/56744855/10718214