Name Matching in python

12.5k views Asked by At

We have a third party 'tool' which finds similar names and assigns a similarity score between two names.

I am supposed to mimic the tool's behavior as closely as possible. After searching over internet, gave a shot at distance method.Used fuzzywuzzy for the same.

matches = process.extractBests(
    name, 
    choices, 
    score_cutoff=50, 
    scorer=fuzz.token_sort_ratio,
    limit=1 
);

It gave results close to the tool result.However there are few outliers - as highlighted below.

enter image description here

After further searches over internet, I have come to the understand that further refinement will need implementation of machine learning of sort. I am a complete newbie in the machine learning world - so seeking some advice as to where I should attempt at next for further code refinement.

Thanks!

2

There are 2 answers

1
Michael Bianconi On

Take a look at the Jaccard and Levenshtein algorithms for fuzzy string matching. Both are relatively simple and can be implemented in about 40 or 50 lines of code.

0
Yash M On

Take a look at this package. It is tailor-made for Name Matching HMNI Package