I am trying to fuzzy match using recordlinkage in python. I am matching by name of businesses and zipcode from two different datasets.
Here is the code I am using
reference_usa = pd.read_csv('all_reference_usa.csv', index_col='companyname')
oc_sample = pd.read_csv('oc_sample.csv', index_col='name')
indexer = recordlinkage.Index()
indexer.full()
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))
And here is the error.
ValueError('index of DataFrame is not unique')
The issue that I am running into is that I get an error code of index of DataFrame is not unique. This is because there maybe a company with the same name but different location. Is it possible to ignore this rule or can I add an additional index col for zipcode. Ideally, I would like to match the companyname by name and zipcode of the business.