R: Compare columns in two tables for missing values and produce True False in a new column

644 views Asked by At

I have email data that I'd like to compare. To gather the data I had to pull sends, deliveries, opens, etc. from separate tables, so I essentially have 5 data frames with identical information, but the send table has 100% of user ID's that were mailed. The delivery, open, etc. tables contain the exact same variables/columns, but just less and less of the rows since not everyone that was sent an email opened it, clicked it, etc.

What I'd like to do is combine all of this into the sent data frame, creating a new column that has a Y/N for each user saying whether that user received the email, opened it, clicked, by comparing whether the USER ID exists in the subsequent tables. So sort of a semi-join, but I just want to create a new column in the 1st table that says whether each unique USER ID exists in the other tables or not. Simplified example, I have the first column from each of the two tables below.

Sent       USER ID  1  3  17  26  35  124
           Deliv?   Y  N  Y   N   Y   Y
Delivered  USER ID  1  17  35  124

Tried using mutate with match, then with ifelse, but no dice so far.

Thanks!

1

There are 1 answers

4
Rui Barradas On

Try the following.

x <- scan(text = "1  3  17  26  35  124")
y <- scan(text = "1  17  35  124")
z <- factor(x %in% y, labels = c("N", "Y"))
z
#[1] Y N Y N Y Y
#Levels: N Y

Of course, the names of your variables will be different. But the method is this one.

EDIT.
Here is another way, if you don't want a result of class factor.

z2 <- c("N", "Y")[(x %in% y) + 1L]
z2
#[1] "Y" "N" "Y" "N" "Y" "Y"

EDIT 2.
Note that instead of TRUE/FALSE it's possible to convert the logical values into binary integers in a number of ways. The more readable one would be to use as.integer.

z <- factor(as.integer(x %in% y), labels = c("N", "Y"))

[Two other ways are hacks, to add zero or to multiply by one: (x %in% y) + 0L or (x %in% y)*1L.]

Then you would create a new column in data.frame sent with the result of this.

sent$Deliv <- z    # or z2

Or even more simple, don't create the intermediate variable z (or z2) and assign factor(...) directly.

sent$Deliv <- factor(as.integer(x %in% y), labels = c("N", "Y"))