Compare two pandas df rowwise and apply conditional formating to matching values

Question

Compare two pandas df rowwise and apply conditional formating to matching values

46 views Asked by NeuroNaut At 02 August 2023 at 22:50

I think its a small problem, but I did not succeed with a code solution. I have two dataframes df_diff and df_all. Both pandas df have the same key column (e.g. "Key") but different column names.

The code should iterate over rows of df_diff, take the key value, look up the row with the key value in df_all, and then iterate over all cells of this row of df_diff and search if any of the cells matches one cell value of the corresponding row in df_all.

If there is a match, this cell should receive red background color.

Note that the column names a different between these data frames, except the Key column.

Here is an example input: df_diff

Key	Column_1	Column 2
Key2	Value2	Value3
Key3	Value3	Value4
Key4	Value5	Value6

df_all

Key	Column_all_A	Column_all_B
Key2	Value8	Value2
Key3	Value3	Value10
Key6	Value0	Value11

The Expected Output:

Original Q&A

There are 2 answers

NeuroNaut On 03 August 2023 at 00:13

Here is my answer to my own question:

import pandas as pd

# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)

# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)

# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
    # Get the key value from the current row in df_all
    key_value = row_all['Key']

    # Filter the corresponding row in df_diff using the key value
    row_diff = df_diff[df_diff['Key'] == key_value]

    # Check if a matching row is found in df_diff
    if not row_diff.empty:
        # Iterate over columns in df_all (except the 'Key' column)
        for col_all in row_all.index[1:]:
            # Iterate over columns in the matching row of df_diff (except the 'Key' column)
            for col_diff in row_diff.columns[1:]:
                # Check if the cell value in df_all matches any cell value in the matching row of df_diff
                if row_all[col_all] == row_diff[col_diff].iloc[0]:
                # If a match is found, return a list with red background for the matched cell in df_all
                    return ['background-color: red' if col == col_all else '' for col in row_all.index]
    # If no match is found, return a list with no background color for all cells in df_all
    return ['' for _ in row_all.index]

# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)

# Display the highlighted DataFrame
df_highlighted

This give me the desired output:

However, does anyone have a more elegant and shorter way? I would like to define a styler() function with formatting condition in it and the apply the format using df.style.apply() or df.style.applymap() to each of the matching cells.

**Timeless** · Accepted Answer · 2023-08-03T00:29:36+00:00

Here is one of the options that uses a mapper with a listcomp to build the styles :

lstyles = [
    ["background-color:lightcoral" # <-- adjust the color here
    if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
    for v in vals] for k, *vals in df_all.values
]

use_cols = df_all.columns.difference(["Key"])

out = (
    df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
                       axis=None, subset=use_cols)
)

Output :

TechQA.

Compare two pandas df rowwise and apply conditional formating to matching values

There are 2 answers

Related Questions in PYTHON

Related Questions in FORMATTING

Related Questions in MATCHING

Related Questions in ROWWISE

Popular Questions

Trending Questions