I'm trying to pass different pandas dataframes to a function that does some string modification (usually str.replace operation on columns based on mapping tables stored in CSV files) and return the modified dataframes. And I'm encountering errors especially with handling the dataframe as a parameter.
The mapping table in CSV is structured as follows:
| From(Str) | To(Str) | Regex(True/False) |
|---|---|---|
| A | A2 | |
| B | B2 | |
| CD (.*) FG | CD FG | True |
My code looks as something like this:
def apply_mapping_table (p_df, p_df_col_name, p_mt_name):
df_mt = pd.read_csv(p_mt_name)
for index in range(df_mt.shape[0]):
# If regex is true
if df_mt.iloc[index][2] is True:
# perform regex replacing
df_p[p_df_col_name] = df_p[p_df_col_name].replace(to_replace=df_mt.iloc[index][0], value = df_mt.iloc[index][1], regex=True)
else:
# perform normal string replacing
p_df[p_df_col_name] = p_df[p_df_col_name].replace(df_mt.iloc[index][0], df_mt.iloc[index][1])
return df_p
df_new1 = apply_mapping_table1(df_old1, 'Target_Column1', 'MappingTable1.csv')
df_new2 = apply_mapping_table2(df_old2, 'Target_Column2', 'MappingTable2.csv')
I'm getting 'IndexError: single positional indexer is out-of-bounds' for 'df_mt.iloc[index][2]' and haven't gone to the portion where the actual replacement is happening. Any suggestions to make it work or even a better way to do the dataframe string replacements based on mapping tables?
You can use the
.iterrows()function to iterate through lookup table rows. Generally, the.iterrows()function is slow, but in this case because the lookup table should be a small manageable table it will be completely fine.You can adapt your give function as I did in the following snippet: