Python: Comparing 2 sets of data, yield best match and match %

Question

Python: Comparing 2 sets of data, yield best match and match %

919 views Asked by Andrew G. At 05 January 2017 at 02:58

I've seen lots of Q&A on this topic, but none contain the type of output I'm looking for. Any words of wisdom on this would be very much appreciated!

I have 2 lists... both lists contain 1 column, consisting of Full Name|University (i.e., name and university, concatenated, and separated by a pipe)
There's not always an exact match, due to nicknames and university abbreviations. I want to compare each record in list 1 with each record in list 2, and find the closest match.
I then want to produce an output file with 3 columns: Every item from list 1, The closest match from list 2, and the match %.

Does anyone have sample code they could share? Thanks!

Original Q&A

There are 1 answers

**David Whitlock** · Accepted Answer · 2017-01-05T05:02:44+00:00

To get you started, here is an answer which can provide matches on either the full name or the university - you could extend it to include fuzzy search using a library like fuzzywuzzy:

For both lists, split each string into a [full name, university] list (if some of the strings don't contain the '|' character, you might need to wrap this in a try, except or an if statement):

new_list = [item.split('|') for item in old_list]
Run the following command to match on either element (assuming that one list is called list1 and the other list is called list2):

matches = [val for val in list1 for item in list2 if val[0] == item[0] or val[1] == item[1]]

TechQA.

Python: Comparing 2 sets of data, yield best match and match %

There are 1 answers

Related Questions in PYTHON

Related Questions in FUZZY-SEARCH

Related Questions in FUZZY-LOGIC

Popular Questions

Popular Tags

Trending Questions