import pandas as pd
import recordlinkage as rl
lst_left = [...]
lst_right = [...]
df_left = pd.DataFrame(lst_left, columns=pd.Index(["city_id", "street_name"]))
df_right = pd.DataFrame(lst_right, columns=pd.Index(["city_id", "street_name"]))
indexer = rl.Index()
indexer.block("city_id")
pairs = indexer.index(df_left, df_right)
compare = rl.Compare(indexing_type="label")
compare.string("street_name", "street_name", method="damerau_levenshtein", threshold=0.7)
features = compare.compute(pairs, df_left, df_right)
matches = features[features[0] == 1.0]
And I get matches pairs MultiIndex
Out[4]:
0
0 0 1.0
1 1 1.0
2 2 1.0
4 3 1.0
6 5 1.0
7 6 1.0
8 7 1.0
10 8 1.0
12 9 1.0
13 10 1.0
14 11 1.0
15 12 1.0
And now I want to left join (sql left outer join) df_left and df_right dataframes based on those matches pairs keeping unmatched elements from df_left DataFrame.
How can I do that?
P.S. To get only matched records I use
df_left.loc[matches.index.get_level_values(0)].reset_index().merge(df_right.loc[matches.index.get_level_values(1)].reset_index(), how="left", left_index=True, right_index=True)
But I don't know how to merge and keep unmatched rows from left DataFrame.
Thank You