How would I go about joining or merging data frames of different sizes while also overwriting missing values in R?

482 views Asked by At

R Question

I am looking to join multiple data frames of unequal size. While joining the data frames, I would like to overwrite any NAs. I attempted to use the coalesce function, but equal sized data frames were required.

Example

x <- data.frame(
    ID = c(1,2,3,4,5),
    Location = c("Georgia", NA, NA, "Idaho", "Texas"),
    Cost = c(NA, 200, NA, 400, 500)
)

y <- data.frame(
    ID = c(1, 2, 3),
    Location = c("Wyoming", "Florida", "Toronto"),
    Cost = c(150, 100, 450)
)

Desired Result

ID  Location   Cost
1   Georgia    150
2   Florida    200
3   Toronto    450
4   Idaho      400
5   Texas      500
1

There are 1 answers

2
Ronak Shah On

You can do a full_join and then use coalesce for Location and Cost columns.

library(dplyr)

full_join(x, y, by = 'ID') %>%
  mutate(Location = coalesce(Location.x, Location.y), 
         Cost = coalesce(Cost.x, Cost.y)) %>%
 select(names(x))

#  ID Location Cost
#1  1  Georgia  150
#2  2  Florida  200
#3  3  Toronto  450
#4  4    Idaho  400
#5  5    Texas  500

In base R, we can use ifelse to select values from Location and Cost columns.

transform(merge(x, y, by = 'ID', all = TRUE), 
          Location = ifelse(is.na(Location.x), Location.y, Location.x), 
          Cost = ifelse(is.na(Cost.x), Cost.y, Cost.x))[names(x)]