Compare two dataframes in Python & R using one key variable

52 views Asked by At

I have two dataframes df1 & df2 listed below

ID  Age Weight
1   30  62
2   40  80
3   28  77
4   35  70

ID  Age Weight
1   30  62
2   40  80
3   28  97
4   35  87
5   25  65
6   33  70

i want to compare Weight variable in df1 with df2 using ID field and i need the difference showing below

ID  Weight_df1  Weight_df2
3     77             97
4     70             87
2

There are 2 answers

0
AudioBubble On

Here is a tidyverse approach in R that will do it all.

library(dplyr)
library(purrr)
library(tidyr)

map_dfr(list(df1, df2), I, .id = "df") %>%
  group_by(ID, Age, Weight) %>%
  filter(n() == 1) %>%
  group_by(ID) %>%
  filter(n() > 1) %>%
  pivot_wider(id_cols = "ID", values_from = "Weight", names_from = "df", names_prefix = "Weight_df")

# A tibble: 2 x 3
# Groups:   ID [2]
     ID Weight_df1 Weight_df2
  <dbl>      <dbl>      <dbl>
1     3         77         97
2     4         70         87

data

library(tibble)

df1 <- tribble(~ID,  ~Age, ~Weight,
               1,   30,  62,
               2,   40,  80,
               3,   28,  77,
               4,   35,  70)

df2 <- tribble(~ID,  ~Age, ~Weight,
               1,   30,  62,
               2,   40,  80,
               3,   28,  97,
               4,   35,  87,
               5,   25,  65,
               6,   33,  70)
2
anky On

Pandas version, use merge() with query():

(df1[['ID','Weight']].merge(df2[['ID','Weight']],on='ID',suffixes=('_df1','_df2')) 
                             .query('Weight_df1!=Weight_df2'))

   ID  Weight_df1  Weight_df2
2   3          77          97
3   4          70          87