I've seen lots of Q&A on this topic, but none contain the type of output I'm looking for. Any words of wisdom on this would be very much appreciated!
- I have 2 lists... both lists contain 1 column, consisting of Full Name|University (i.e., name and university, concatenated, and separated by a pipe)
- There's not always an exact match, due to nicknames and university abbreviations. I want to compare each record in list 1 with each record in list 2, and find the closest match.
- I then want to produce an output file with 3 columns: Every item from list 1, The closest match from list 2, and the match %.
Does anyone have sample code they could share? Thanks!
To get you started, here is an answer which can provide matches on either the full name or the university - you could extend it to include fuzzy search using a library like fuzzywuzzy:
For both lists, split each string into a [full name, university] list (if some of the strings don't contain the '|' character, you might need to wrap this in a
try, except
or anif
statement):new_list = [item.split('|') for item in old_list]
Run the following command to match on either element (assuming that one list is called
list1
and the other list is calledlist2
):matches = [val for val in list1 for item in list2 if val[0] == item[0] or val[1] == item[1]]