Assigning a value to a column based on a mapping defined in a dictionary

Question

Assigning a value to a column based on a mapping defined in a dictionary

75 views Asked by sqaseemali At 22 September 2023 at 02:28

I am trying to implement a code that reads a csv file, creates a data frame out of it, and then tags each row the name of the key, if one of the columns in that row contains the same string as in the key of the dataframe.

As an example, I have the following dictionary defined:

Sdiction={
        "Mgage" : ["ABC Gage","XYZ Gage"],
        "Rate" : ["deg/min","rad/s","rpm"]}

And I have the following dataframe:

Col A	Col B	Col C
1	30	ABC Gage
2	45	deg/min
3	150	Gage

I want to tag Col D for each row as

Row 1 - Col D = Mgage (since ABC Gage exists in the key Mgage)

Row 2 - Col D = Rate (Since deg/min exists in the key Rate)

Row 3 - Col D = Mgage (since the string Gage exists in the key Mgage, albeit partially)

Expected output:

Col A	Col B	Col C	Col D
1	30	ABC Gage	Mgage
2	45	deg/min	Rate
3	150	Gage	Mgage

I am trying to figure out how to implement this part, have not yet implemented it, and therefore need help.

Original Q&A

There are 2 answers

Soudipta Dutta On 01 February 2024 at 12:39

NumPy's vectorized operations like np.isin and str.lower are optimized for efficient handling of large arrays, leading to faster execution.

This method consumes slightly more memory due to the creation of the lookup table, but this is usually offset by the faster execution speeds.

import pandas as pd
import numpy as np

dict = {
    "Mgage": ["ABC Gage", "XYZ Gage"],
    "Rate": ["deg/min", "rad/s", "rpm"]
}
df = pd.DataFrame({
    "Col A": [1, 2, 3, 4, 5],
    "Col B": [30, 45, 150, 70, 60],
    "Col C": ["ABC Gage", "deg/min", "Gage", "rad/s", "rpm"]
})

lookup_table = {v.lower() : k for k,l in dict.items() for v in l }
"""print(lookup_table)
{'abc gage': 'Mgage', 'xyz gage': 'Mgage', 'deg/min': 'Rate', 'rad/s': 'Rate', 'rpm': 'Rate'}
"""
df['Col_C_lower'] = df['Col C'].str.lower()

# Vectorized matching using NumPy
matches  = np.isin(df['Col_C_lower'].to_numpy(),list(lookup_table.keys()))
"""print(matches)
[ True  True False  True  True]"""

# Map matches to dictionary keys using the lookup table
df['Col_Matches'] = df['Col_C_lower'].map(lookup_table).where(matches, df['Col C'])

# Drop the temporary column, optional
#df.drop('Col_C_lower', axis=1, inplace=True)
"""print(df)
   Col A  Col B     Col C Col_C_lower   Col_Matches
0      1     30  ABC Gage    abc gage       Mgage
1      2     45   deg/min     deg/min        Rate
2      3    150      Gage        gage        Gage
3      4     70     rad/s       rad/s        Rate
4      5     60       rpm         rpm        Rate"""

**mozway** · Accepted Answer · 2023-09-22T06:45:10+00:00

Using a regex match:

import re

s = df['Col C'].str.casefold()
pattern = '(%s)' % '|'.join(map(re.escape, s))
# '(abc\\ gage|deg/min|gage)'

# reverse dictionary
tmp = pd.Series({v.casefold(): k for k, l in Sdiction.items()
                 for v in l}, name='ref').reset_index()

# extract first match, map reference key
df['Col D'] = s.map(tmp.assign(match=tmp['index'].str.extract(pattern))
                       .dropna(subset=['match'])
                       .set_index('match')['ref']
                    )

Output:

   Col A  Col B     Col C  Col D
0      1     30  ABC Gage  Mgage
1      2     45   deg/min   Rate
2      3    150      Gage  Mgage

TechQA.

Assigning a value to a column based on a mapping defined in a dictionary

There are 2 answers

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DICTIONARY

Related Questions in TAGGING

Popular Questions

Trending Questions