Record Linkage matching two different datasets Python

100 views Asked by econ_grad12345 At 21 May 2023 at 19:28

I am trying to fuzzy match using recordlinkage in python. I am matching by name of businesses and zipcode from two different datasets.

Here is the code I am using

reference_usa = pd.read_csv('all_reference_usa.csv', index_col='companyname')
oc_sample = pd.read_csv('oc_sample.csv', index_col='name')

indexer = recordlinkage.Index()
indexer.full()

candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))

And here is the error.

ValueError('index of DataFrame is not unique')

The issue that I am running into is that I get an error code of index of DataFrame is not unique. This is because there maybe a company with the same name but different location. Is it possible to ignore this rule or can I add an additional index col for zipcode. Ideally, I would like to match the companyname by name and zipcode of the business.

Original Q&A

TechQA.

Record Linkage matching two different datasets Python

There are 0 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in FUZZYWUZZY

Related Questions in RECORD-LINKAGE

Popular Questions

Trending Questions