Python 3.6 Pandas Difflib Get_Close_Matches to filter a dataframe with user input

5.5k views Asked by At

Using a csv imported using a pandas dataframe, I am trying to search one column of the df for entries similar to a user generated input. Never used difflib before and my tries have ended in a TypeError: object of type 'float' has no len() or an empty [] list.

import difflib
import pandas as pd

df = pd.read_csv("Vendorlist.csv", encoding= "ISO-8859-1")
word = input ("Enter a vendor: ")

def find_it(w):
    w = w.lower()
    return difflib.get_close_matches(w, df.vendorname, n=50, cutoff=.6)

alternatives = find_it(word)
print (alternatives)

The error seems to occur at "return.difflib.get_close_matches(w, df.vendorname, n=50, cutoff=.6)"

Am attempting to get similar results to "word" with a column called 'vendorname'.

Help is greatly appreciated.

2

There are 2 answers

2
piRSquared On BEST ANSWER

Your column vendorname is of the incorrect type.

Try in your return statement:

return difflib.get_close_matches(w, df.vendorname.astype(str), n=50, cutoff=.6)

import difflib
import pandas as pd

df = pd.read_csv("Vendorlist.csv", encoding= "ISO-8859-1")
word = input ("Enter a vendor: ")

def find_it(w):
    w = w.lower()
    return difflib.get_close_matches(w, df.vendorname.astype(str), n=50, cutoff=.6)

alternatives = find_it(word)
print (alternatives)

As stated in the comments by @johnchase

The question also mentions the return of an empty list. The return of get_close_matches is a list of matches, if no item matched within the cutoff an empty list will be returned – johnchase

0
supernooba On

I've skipped the:

astype(str)in (return difflib.get_close_matches(w, df.vendorname.astype(str), n=50, cutoff=.6))

Instead used:

dtype='string' in (df = pd.read_csv("Vendorlist.csv", encoding= "ISO-8859-1"))