I'm trying to access character strings from a vector in a for-loop.
I have a Corpus like this one:
library(tm)
corpus = Corpus(VectorSource(c("cfilm,cgame,ccd","cd,film,cfilm")))
My goal is to get rid off all unnecessary "c" characters. Note, that this means I don't want to remove the c from cd, but ccd, cgame and so forth.
I use this function, which takes in a corpus and removes a term with a second one.
toString = content_transformer(function(x,from,to)gsub(from, to, x))
So, for example, to replace cgame with game, I use
corpus = tm_map(corpus,toString,"cgame","game")
Now, I instead of repeating this line for all the terms, I'd like to use a loop that iterates to all the possible replacements using a vector with the relevant terms.
replace = c("game","film","cd")
I tried two approaches, but none of them would work:
for(i in replace){tm_map(corpus,toString,paste("c",get(i),sep=""),get(i))}
and
for(i in 1:length(replace)){tm_map(corpus,toString,paste("c",replace[i],sep=""),replace[i])}
In the first case R tells me that it can't find the object received by get(i): Error in get(i) : object 'game' not found.
In the second, there is no error message, but nothing changes within the corpus.
How can I query items within a vector as strings, so the for-loop would repeat for all the terms, what I did with corpus = tm_map(corpus,toString,"cgame","game)
The
tm_mapfunction doesn't modify the corpus in place, it returns a modified corpus. Right now you are not doing anything to save the result. TryAlso,
toStringis actually the name of a function in base R so it's not a good idea to write your own function with the same name.Finally, the
get()method returns the value of an R variable with the same name as the character value you pass it. There is no reason to useget()here since you want to continue working with strings and not variable names.