Avoid creating columns which does not match any strings

35 views Asked by At

I want to create new columns based on strings match. I am able to create but it is also creating columns which does not match. For an example:

      x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on 
          Cypher"))
      toMatch <- c("Hackathon","Hands on","Test","java")


      ##Sentence with phrases
      phrases11 <- as.vector(toMatch) 
      res <- sapply(phrases11, grepl, x = as.character(x$name),ignore.case= 
      TRUE)
      rownames(res) <- x$name

      #replacement
      ones <- which(res==1, arr.ind=T)
      res[ones]<-colnames(res)[ones[,2]]
      res

      Output:
                         Hackathon   Hands on     Test     java   
     Java Hackathon     "Hackathon"   "FALSE"    "FALSE"  "java" 
     Intro to Graphs    "FALSE"       "FALSE"    "FALSE"  "FALSE"
     Hands on Cypher    "FALSE"     "Hands on"   "FALSE"  "FALSE"

I don't want "Test" column to be created as I have huge data for matching. So basically, Can we do some code changes in res <- sapply(phrases11, grepl, x = as.character(x$name), ignore.case = TRUE) so that it should only create columns where we have matches from 'toMatch' vector? Is there any other approach?

1

There are 1 answers

0
Onyambu On

Since you are using grepl() function which gives you true or false, you can eliminate the columns with sum=0:

  A=sapply(toMatch,grepl,as.character(x$name),ignore.case=T)
  A[,colSums(A)==1]
     Hackathon Hands on  java
[1,]      TRUE    FALSE  TRUE
[2,]     FALSE    FALSE FALSE
[3,]     FALSE     TRUE FALSE