Python recordlinkage identity

167 views Asked by At

Similar issue as R recordlinkage identity but in python. The algorithm generates new identity that do no reflect the correct identity of the records that were matche. Assuming data duplication with a single dataframe.

PS: It seems to be okay in the data duplication example

1

There are 1 answers

0
Taiwotman On

The index column that is generated using pandas needs to be dropped and replaced by the preferred column in the dataframe to use as the identify column

Logic is

replace index column with identify column in dataframe