Left merge two dataframes based on recordlinkage pair matches (multi index)

356 views Asked by At
import pandas as pd
import recordlinkage as rl


lst_left = [...]
lst_right = [...]

df_left = pd.DataFrame(lst_left, columns=pd.Index(["city_id", "street_name"]))
df_right = pd.DataFrame(lst_right, columns=pd.Index(["city_id", "street_name"]))

indexer = rl.Index()
indexer.block("city_id")
pairs = indexer.index(df_left, df_right)
compare = rl.Compare(indexing_type="label")
compare.string("street_name", "street_name", method="damerau_levenshtein", threshold=0.7)
features = compare.compute(pairs, df_left, df_right)
matches = features[features[0] == 1.0]

And I get matches pairs MultiIndex

Out[4]: 
         0
0  0   1.0
1  1   1.0
2  2   1.0
4  3   1.0
6  5   1.0
7  6   1.0
8  7   1.0
10 8   1.0
12 9   1.0
13 10  1.0
14 11  1.0
15 12  1.0

And now I want to left join (sql left outer join) df_left and df_right dataframes based on those matches pairs keeping unmatched elements from df_left DataFrame.

How can I do that?

P.S. To get only matched records I use

df_left.loc[matches.index.get_level_values(0)].reset_index().merge(df_right.loc[matches.index.get_level_values(1)].reset_index(), how="left", left_index=True, right_index=True)

But I don't know how to merge and keep unmatched rows from left DataFrame.

Thank You

0

There are 0 answers