Applying Jaro-Winkler distance to two dataframes

Question

Applying Jaro-Winkler distance to two dataframes

924 views Asked by rshar At 27 November 2022 at 22:25

I have two dataframes of unequal length and would like to compare the similarity of strings in df2 with df1. Is it possible to apply Jaro-Winkler distance method to calculate the string similarity on two dataframes through map/lambda function.

df1
Behavioral disorders
Behçet disease
AV-Block

df2
Behavioral disorder
Behçet syndrome

The desired output is:

name_left                 name_right            score   
Behavioral disorders      Behavioral disorder   0.933333
Behçet disease            Behçet syndrome       0.865342

The scores mentioned above are hypothetical. Any help is highly appreciated

Original Q&A

There are 1 answers

**mozway** · Answer 1 · 2022-11-27T22:47:54+00:00

Assuming you want the max score and that the original columns in the input are "name":

# pip install jaro-winkler
# https://pypi.org/project/jaro-winkler/
from jaro import jaro_winkler_metric as jw

pd.DataFrame([[n2, *max([(n1, jw(n1, n2)) for n1 in df1['name']],
                        lambda x: x[1])]
              for n2 in df2['name']],
              index=df2.index,
              columns=['name_right', 'name_left', 'score']
            )[['name_left', 'name_right', 'score']]

TechQA.

Applying Jaro-Winkler distance to two dataframes

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in JARO-WINKLER

Popular Questions

Popular Tags

Trending Questions