Record Linkage In Pyspark

481 views Asked by At

How to achieve recordlinkage functionality in Pyspark ??? I want to do a similarity check between Dataset1 Name and Dataset 2 Name.

Please help suggest me if any library available for pyspark.

I try with the recordlinkage library of pyhton but it is working with pandas dataframe.

1

There are 1 answers

0
Nick Crews On

Splink is the best option that I know of.