I'm trying to access character strings from a vector in a for-loop.
I have a Corpus like this one:
library(tm)
corpus = Corpus(VectorSource(c("cfilm,cgame,ccd","cd,film,cfilm")))
My goal is to get rid off all unnecessary "c" characters. Note, that this means I don't want to remove the c from cd, but ccd, cgame and so forth.
I use this function, which takes in a corpus and removes a term with a second one.
toString = content_transformer(function(x,from,to)gsub(from, to, x))
So, for example, to replace cgame with game, I use
corpus = tm_map(corpus,toString,"cgame","game")
Now, I instead of repeating this line for all the terms, I'd like to use a loop that iterates to all the possible replacements using a vector with the relevant terms.
replace = c("game","film","cd")
I tried two approaches, but none of them would work:
for(i in replace){tm_map(corpus,toString,paste("c",get(i),sep=""),get(i))}
and
for(i in 1:length(replace)){tm_map(corpus,toString,paste("c",replace[i],sep=""),replace[i])}
In the first case R tells me that it can't find the object received by get(i): Error in get(i) : object 'game' not found
.
In the second, there is no error message, but nothing changes within the corpus.
How can I query items within a vector as strings, so the for-loop would repeat for all the terms, what I did with corpus = tm_map(corpus,toString,"cgame","game)
The
tm_map
function doesn't modify the corpus in place, it returns a modified corpus. Right now you are not doing anything to save the result. TryAlso,
toString
is actually the name of a function in base R so it's not a good idea to write your own function with the same name.Finally, the
get()
method returns the value of an R variable with the same name as the character value you pass it. There is no reason to useget()
here since you want to continue working with strings and not variable names.