Compare two text columns to measure their similarity in a dataframe in python

393 views Asked by mehdi samimi At 03 May 2022 at 01:18

I want to compare columns A with C and also B with C and measure each pair's similarity and then report the one that has a higher degree of similarity.

df = pd.DataFrame([['JAMES LIKEN', 'LINDEN R. EVANS', 'LINDEN R. EVANS'], ['HENRY THEISEN', 'SCOTT ULLEM', 'Henry J. Theisen']])
df.columns = ['A', 'B', 'C']

Result should be in the form of three columns. The first two contain similarity ratio and the third column should contain either column A or B, whichever that is more similar to C. I used fuzz.partial_ratio and sequencematcher, and used apply and lambda to use the function for each row, but it led to error.

Original Q&A

TechQA.

Compare two text columns to measure their similarity in a dataframe in python

There are 0 answers

Related Questions in PYTHON

Related Questions in LAMBDA

Related Questions in APPLY

Related Questions in FUZZYWUZZY

Related Questions in SEQUENCEMATCHER

Popular Questions

Popular Tags

Trending Questions