I want to create new columns based on strings match. I am able to create but it is also creating columns which does not match. For an example:
x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on
Cypher"))
toMatch <- c("Hackathon","Hands on","Test","java")
##Sentence with phrases
phrases11 <- as.vector(toMatch)
res <- sapply(phrases11, grepl, x = as.character(x$name),ignore.case=
TRUE)
rownames(res) <- x$name
#replacement
ones <- which(res==1, arr.ind=T)
res[ones]<-colnames(res)[ones[,2]]
res
Output:
Hackathon Hands on Test java
Java Hackathon "Hackathon" "FALSE" "FALSE" "java"
Intro to Graphs "FALSE" "FALSE" "FALSE" "FALSE"
Hands on Cypher "FALSE" "Hands on" "FALSE" "FALSE"
I don't want "Test" column to be created as I have huge data for matching. So basically, Can we do some code changes in res <- sapply(phrases11, grepl, x = as.character(x$name), ignore.case = TRUE)
so that it should only create columns where we have matches from 'toMatch' vector? Is there any other approach?
Since you are using
grepl()
function which gives you true or false, you can eliminate the columns with sum=0: