Lookup items of Col1 in Col2 and Comment the matching Percentage

Question

Lookup items of Col1 in Col2 and Comment the matching Percentage

83 views Asked by s_max At 27 October 2023 at 14:05

My data frame:

data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','Rehburg-Loccum','Dingen','Burg (Dithmarschen)'],
        'Col2': ['Rehburg-Loccum','Grabfeld','Laudenbach','Kirchen','Jesewitz','Großkrotzenburg','Nachrodt-','Essen/Stadt','Bischofferode','Bad Homburg','Münster','Burg']}

df = pd.DataFrame(data)

I have two columns in my df as below:

col1	col2
Bad Homburg	Rehburg-Loccum
Bischofferode	Grabfeld
Essen	Laudenbach
Grabfeld OT Rentwertshausen	Kirchen
Großkrotzenburg	Jesewitz
Jesewitz/Weg	Großkrotzenburg
Kirchen (Sieg)	Nachrodt-
Laudenbach a. M.	Essen/Stadt
Nachrodt-Wiblingwerde	Bischofferode
Rehburg-Loccum	Bad Homburg
Dingen	Münster
Burg (Dithmarschen)	Burg

I would like to look up col1 data in Col2. If the item is present I would like to write in the same row under the column Lookup_Value and I also Comment percentage of Matching. Below is my expected result:

col1	col2	Lookup_value	Comment
Bad Homburg	Rehburg-Loccum	Bad Homburg	100% Matched
Bischofferode	Grabfeld	Bischofferode	100% Matched
Essen	Laudenbach a. M.	Essen/Stadt	Best Possible Match
Grabfeld OT Rentwertshausen	Kirchen	Grabfeld	Best Possible Match
Großkrotzenburg	Jesewitz	Großkrotzenburg	100% Matched
Jesewitz/Weg	Großkrotzenburg	Jesewitz	Best Possible Match
Kirchen (Sieg)	Nachrodt-	Kirchen	Best Possible Match
Laudenbach	Essen/Stadt	Laudenbach a. M.	Best Possible Match
Nachrodt-Wiblingwerde	Bischofferode	Nachrodt-	Best Possible Match
Rehburg-Loccum	Bad Homburg	Rehburg-Loccum	100% Matched
Dingen	Münster		No Match
Burg (Dithmarschen)	Burg	Burg	Best Possible Match

I am trying this way but not working:

def lookup_value_and_comment(row):
    col1_value = row['Col1']
    col2_value = row['Col2']
    
    if col1_value in col2_value:
        if col1_value == col2_value:
            return pd.Series([col1_value, '100% Matched'], index=['Lookup_value', 'Comment'])
        else:
            return pd.Series([col2_value, 'Best Possible Match'], index=['Lookup_value', 'Comment'])
    else:
        return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])

df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)

print(df)

Original Q&A

There are 1 answers

**inquirer** · Accepted Answer · 2023-10-29T14:20:35+00:00

Here col1_value == col2_value you check the current row col1_value with the current row col2_value, but you need to check all the rows of column col2. Since the rows are not all the same, I applied word division of columns col1, col2 by delimiters: -, /, and space into the list creating columns col3, col4.

To access rows and split them, str is used, filtering is done using isin.

import pandas as pd

df['col3'] = df['col1'].str.split('-|/| ').str[0]
df['col4'] = df['col2'].str.split('-|/| ').str[0]

def lookup_value_and_comment(row):
    col1_value = row['col1']
    col3_value = row['col3']
    ind = df['col4'].isin([col3_value])

    if (df['col2'].isin([col1_value])).any():
        return pd.Series([col1_value, '100% Matched'],
                         index=['Lookup_value', 'Comment'])
    elif ind.any():
        return pd.Series([df.loc[ind, 'col2'].values[0],
                          'Best Possible Match'], index=['Lookup_value', 'Comment'])
    else:
        return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])




df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)

Output:

                           col1              col2                             col3                  col4      Lookup_value              Comment
0                   Bad Homburg    Rehburg-Loccum                   [Bad, Homburg]     [Rehburg, Loccum]       Bad Homburg         100% Matched
1                 Bischofferode          Grabfeld                  [Bischofferode]            [Grabfeld]     Bischofferode         100% Matched
2                         Essen  Laudenbach a. M.                          [Essen]  [Laudenbach, a., M.]       Essen/Stadt  Best Possible Match
3   Grabfeld OT Rentwertshausen           Kirchen  [Grabfeld, OT, Rentwertshausen]             [Kirchen]          Grabfeld  Best Possible Match
4               Großkrotzenburg          Jesewitz                [Großkrotzenburg]            [Jesewitz]   Großkrotzenburg         100% Matched
5                  Jesewitz/Weg   Großkrotzenburg                  [Jesewitz, Weg]     [Großkrotzenburg]          Jesewitz  Best Possible Match
6                Kirchen (Sieg)         Nachrodt-                [Kirchen, (Sieg)]          [Nachrodt, ]           Kirchen  Best Possible Match
7                    Laudenbach       Essen/Stadt                     [Laudenbach]        [Essen, Stadt]  Laudenbach a. M.  Best Possible Match
8         Nachrodt-Wiblingwerde     Bischofferode         [Nachrodt, Wiblingwerde]       [Bischofferode]         Nachrodt-  Best Possible Match
9                Rehburg-Loccum       Bad Homburg                [Rehburg, Loccum]        [Bad, Homburg]    Rehburg-Loccum         100% Matched
10                       Dingen           Münster                         [Dingen]             [Münster]                               No Match
11          Burg (Dithmarschen)              Burg           [Burg, (Dithmarschen)]                [Burg]              Burg  Best Possible Match

TechQA.

Lookup items of Col1 in Col2 and Comment the matching Percentage

There are 1 answers

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in VLOOKUP

Related Questions in STRING-MATCHING

Related Questions in TEXTMATCHING

Popular Questions

Popular Tags

Trending Questions