Compare two pandas df rowwise and apply conditional formating to matching values

46 views Asked by At

I think its a small problem, but I did not succeed with a code solution. I have two dataframes df_diff and df_all. Both pandas df have the same key column (e.g. "Key") but different column names.

The code should iterate over rows of df_diff, take the key value, look up the row with the key value in df_all, and then iterate over all cells of this row of df_diff and search if any of the cells matches one cell value of the corresponding row in df_all.

If there is a match, this cell should receive red background color.

Note that the column names a different between these data frames, except the Key column.

Here is an example input: df_diff

Key Column_1 Column 2
Key2 Value2 Value3
Key3 Value3 Value4
Key4 Value5 Value6

df_all

Key Column_all_A Column_all_B
Key2 Value8 Value2
Key3 Value3 Value10
Key6 Value0 Value11

The Expected Output: Expected Output of df_all with conditioned formatting

2

There are 2 answers

2
Timeless On BEST ANSWER

Here is one of the options that uses a mapper with a listcomp to build the styles :

lstyles = [
    ["background-color:lightcoral" # <-- adjust the color here
    if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
    for v in vals] for k, *vals in df_all.values
]

use_cols = df_all.columns.difference(["Key"])

out = (
    df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
                       axis=None, subset=use_cols)
)

Output :

enter image description here

0
NeuroNaut On

Here is my answer to my own question:

import pandas as pd

# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)

# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)

# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
    # Get the key value from the current row in df_all
    key_value = row_all['Key']

    # Filter the corresponding row in df_diff using the key value
    row_diff = df_diff[df_diff['Key'] == key_value]

    # Check if a matching row is found in df_diff
    if not row_diff.empty:
        # Iterate over columns in df_all (except the 'Key' column)
        for col_all in row_all.index[1:]:
            # Iterate over columns in the matching row of df_diff (except the 'Key' column)
            for col_diff in row_diff.columns[1:]:
                # Check if the cell value in df_all matches any cell value in the matching row of df_diff
                if row_all[col_all] == row_diff[col_diff].iloc[0]:
                # If a match is found, return a list with red background for the matched cell in df_all
                    return ['background-color: red' if col == col_all else '' for col in row_all.index]
    # If no match is found, return a list with no background color for all cells in df_all
    return ['' for _ in row_all.index]

# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)

# Display the highlighted DataFrame
df_highlighted

This give me the desired output: enter image description here

However, does anyone have a more elegant and shorter way? I would like to define a styler() function with formatting condition in it and the apply the format using df.style.apply() or df.style.applymap() to each of the matching cells.