How to get common values between two columns in R?

1.8k views Asked by At

Dataframe as an example:

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 MyAim = c("A","D","",""))

      x    y MyAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A      
4 C,E,G <NA>      

I want to get common values between x and y columns in a new one. Thanks in advance.

3

There are 3 answers

2
Ronak Shah On BEST ANSWER

We can use mapply :

df$Z <- mapply(function(x, y) {
            temp <- intersect(x, y)
            if(length(temp)) temp else ""
        }, strsplit(df$x, ","), df$y)

df
#      x    y Z
#1 A,B,C    A A
#2   A,D    D D
#3 B,C,E    A  
#4 C,E,G <NA>  

If there are multiple values in y, we can split the string in y and return a comma-separated value.

df$Z <- mapply(function(x, y) {
     temp <- intersect(x, y)
     if(length(temp)) toString(temp) else ""
     }, strsplit(df$x, ","), strsplit(df$y, ","))

data

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 stringsAsFactors = FALSE)
0
Edward On

If x is character, then the following is one of many ways to do this:

intersect(unlist(strsplit(df$x, split=",")), df$y)

If x is not a character, then strsplit will crash, so the following is required:

intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)

And to add this to the data frame,

myAim <- intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)
df$myAim <- c(myAim, rep(NA, nrow(df)-length(myAim)))
df
      x    y myAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A  <NA>
4 C,E,G <NA>  <NA>

Note: If y contained values like in x, then the length of myAim may be higher than the number of rows of the data frame. In that situation, adding the result to the data frame may not seem appropriate.

1
jay.sf On

strsplit could easily be used in an apply which coerces to character, try

df <- transform(df, MyAim=apply(df, 1, function(x) {
  s <- el(strsplit(x[1], ","))
  s[match(x[2], s)]
  }))
df
# x    y MyAim
# 1 A,B,C    A     A
# 2   A,D    D     D
# 3 B,C,E    A  <NA>
# 4 C,E,G <NA>  <NA>