Similar issue as R recordlinkage identity but in python. The algorithm generates new identity that do no reflect the correct identity of the records that were matche. Assuming data duplication with a single dataframe.
PS: It seems to be okay in the data duplication example
The index column that is generated using pandas needs to be dropped and replaced by the preferred column in the dataframe to use as the identify column
Logic is