My data frame:
data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','Rehburg-Loccum','Dingen','Burg (Dithmarschen)'],
'Col2': ['Rehburg-Loccum','Grabfeld','Laudenbach','Kirchen','Jesewitz','Großkrotzenburg','Nachrodt-','Essen/Stadt','Bischofferode','Bad Homburg','Münster','Burg']}
df = pd.DataFrame(data)
I have two columns in my df as below:
col1 | col2 |
---|---|
Bad Homburg | Rehburg-Loccum |
Bischofferode | Grabfeld |
Essen | Laudenbach |
Grabfeld OT Rentwertshausen | Kirchen |
Großkrotzenburg | Jesewitz |
Jesewitz/Weg | Großkrotzenburg |
Kirchen (Sieg) | Nachrodt- |
Laudenbach a. M. | Essen/Stadt |
Nachrodt-Wiblingwerde | Bischofferode |
Rehburg-Loccum | Bad Homburg |
Dingen | Münster |
Burg (Dithmarschen) | Burg |
I would like to look up col1 data in Col2. If the item is present I would like to write in the same row under the column Lookup_Value and I also Comment percentage of Matching. Below is my expected result:
col1 | col2 | Lookup_value | Comment |
---|---|---|---|
Bad Homburg | Rehburg-Loccum | Bad Homburg | 100% Matched |
Bischofferode | Grabfeld | Bischofferode | 100% Matched |
Essen | Laudenbach a. M. | Essen/Stadt | Best Possible Match |
Grabfeld OT Rentwertshausen | Kirchen | Grabfeld | Best Possible Match |
Großkrotzenburg | Jesewitz | Großkrotzenburg | 100% Matched |
Jesewitz/Weg | Großkrotzenburg | Jesewitz | Best Possible Match |
Kirchen (Sieg) | Nachrodt- | Kirchen | Best Possible Match |
Laudenbach | Essen/Stadt | Laudenbach a. M. | Best Possible Match |
Nachrodt-Wiblingwerde | Bischofferode | Nachrodt- | Best Possible Match |
Rehburg-Loccum | Bad Homburg | Rehburg-Loccum | 100% Matched |
Dingen | Münster | No Match | |
Burg (Dithmarschen) | Burg | Burg | Best Possible Match |
I am trying this way but not working:
def lookup_value_and_comment(row):
col1_value = row['Col1']
col2_value = row['Col2']
if col1_value in col2_value:
if col1_value == col2_value:
return pd.Series([col1_value, '100% Matched'], index=['Lookup_value', 'Comment'])
else:
return pd.Series([col2_value, 'Best Possible Match'], index=['Lookup_value', 'Comment'])
else:
return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])
df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)
print(df)
Here
col1_value == col2_value
you check the current rowcol1_value
with the currentrow col2_value
, but you need to check all the rows of column col2. Since the rows are not all the same, I applied word division of columns col1, col2 by delimiters:-, /, and space
into the list creating columnscol3, col4
.To access rows and split them, str is used, filtering is done using isin.
Output: