I have email data that I'd like to compare. To gather the data I had to pull sends, deliveries, opens, etc. from separate tables, so I essentially have 5 data frames with identical information, but the send table has 100% of user ID's that were mailed. The delivery, open, etc. tables contain the exact same variables/columns, but just less and less of the rows since not everyone that was sent an email opened it, clicked it, etc.
What I'd like to do is combine all of this into the sent data frame, creating a new column that has a Y/N for each user saying whether that user received the email, opened it, clicked, by comparing whether the USER ID exists in the subsequent tables. So sort of a semi-join, but I just want to create a new column in the 1st table that says whether each unique USER ID exists in the other tables or not. Simplified example, I have the first column from each of the two tables below.
Sent USER ID 1 3 17 26 35 124
Deliv? Y N Y N Y Y
Delivered USER ID 1 17 35 124
Tried using mutate with match, then with ifelse, but no dice so far.
Thanks!
Try the following.
Of course, the names of your variables will be different. But the method is this one.
EDIT.
Here is another way, if you don't want a result of class
factor.EDIT 2.
Note that instead of
TRUE/FALSEit's possible to convert the logical values into binary integers in a number of ways. The more readable one would be to useas.integer.[Two other ways are hacks, to add zero or to multiply by one:
(x %in% y) + 0Lor(x %in% y)*1L.]Then you would create a new column in data.frame
sentwith the result of this.Or even more simple, don't create the intermediate variable
z(orz2) and assignfactor(...)directly.