Here is my dataframe:
RIGHT_SHORTNAME Item_Name
0 S/BAG PKT SEMBAKO S/BAG PKT SEMBAKO
1 ORAL B 123 SOFT2S ORAL B 123 SOFT2S
2 ORAL B 123 SOFT2S ORAL B 123 SOFT2S
3 CINDERELLA COTBUD CINDERELLA COTBUD
4 PROCHIZ 10S 170GR PROCHIZ 10S 170GR
... ... ...
97163 TT MAX CHO 12X17GR TT MAX CHO 12X17GR
97164 ICELAND VOD 350ML ICELAND VOD 350ML
97165 SUNKIST GUAVA 1 LT SUNKIST GUAVA 1 LT
97166 COSM FAN 12DAR COSM FAN 12DAR
97167 BATHSALT MINERAL C BATHSALT MINERAL C
I want to add column name 'distance' with this code:
def distance(a, b):
_, z, _=process.extractOne(str(a),[str(b)])
return z
df['distance']=distance(df['RIGHT_SHORTNAME'],df['Item_Name'])
it yields this:
RIGHT_SHORTNAME Item_Name distance
0 S/BAG PKT SEMBAKO S/BAG PKT SEMBAKO 98.595506
1 ORAL B 123 SOFT2S ORAL B 123 SOFT2S 98.595506
2 ORAL B 123 SOFT2S ORAL B 123 SOFT2S 98.595506
3 CINDERELLA COTBUD CINDERELLA COTBUD 98.595506
4 PROCHIZ 10S 170GR PROCHIZ 10S 170GR 98.595506
... ... ... ...
97163 TT MAX CHO 12X17GR TT MAX CHO 12X17GR 98.595506
97164 ICELAND VOD 350ML ICELAND VOD 350ML 98.595506
97165 SUNKIST GUAVA 1 LT SUNKIST GUAVA 1 LT 98.595506
97166 COSM FAN 12DAR COSM FAN 12DAR 98.595506
97167 BATHSALT MINERAL C BATHSALT MINERAL C 98.595506
when I checked using df['distance'].describe(), it turns out that df['distance'] is all the same. Can anybody help me?
This is because your
distance
method return only one value and you assign that value to the new columndistance
in dataframe. Thedistance
column thus has all the same value returned bydistance
method.process.extractOne(query, choices)
accepts a string and a list, I guess you want to following syntaxOr