I've gone through and fuzzy matched each element in a list of 20,000+ movie titles with each other element, which returns a value for each pair:
from fuzzywuzzy import fuzz
titles = ['Scary Movie', 'Happy Movie', 'Sappy Movie', 'Crappy Movie']
print fuzz.ratio(titles[2],titles[3])
>>> 91 #/100 higher value denotes closer match
for x in titles:
for y in titles:
fuzzed = fuzz.ratio(x,y)
print "value for %r and %r is %r" % (x, y, fuzzed)
How can I organize this data efficiently? More specifically- how can I get matches to group together based on their match value?
Capturing the return values from the nested loops and then packaging them with x and y into tuples or lists is obviously redundant and messy. I attempted an implementation using classes but I'm missing something.
Using list comprehensions and
itertools.product
:Nice and lazy solution using
toolz