In R:How do I itearate through character strings in a loop?

911 views Asked by At

I'm trying to access character strings from a vector in a for-loop.

I have a Corpus like this one:

library(tm)
corpus = Corpus(VectorSource(c("cfilm,cgame,ccd","cd,film,cfilm")))

My goal is to get rid off all unnecessary "c" characters. Note, that this means I don't want to remove the c from cd, but ccd, cgame and so forth.

I use this function, which takes in a corpus and removes a term with a second one.

toString = content_transformer(function(x,from,to)gsub(from, to, x))

So, for example, to replace cgame with game, I use

corpus = tm_map(corpus,toString,"cgame","game")

Now, I instead of repeating this line for all the terms, I'd like to use a loop that iterates to all the possible replacements using a vector with the relevant terms.

replace = c("game","film","cd")

I tried two approaches, but none of them would work:

for(i in replace){tm_map(corpus,toString,paste("c",get(i),sep=""),get(i))}

and

for(i in 1:length(replace)){tm_map(corpus,toString,paste("c",replace[i],sep=""),replace[i])}

In the first case R tells me that it can't find the object received by get(i): Error in get(i) : object 'game' not found. In the second, there is no error message, but nothing changes within the corpus.

How can I query items within a vector as strings, so the for-loop would repeat for all the terms, what I did with corpus = tm_map(corpus,toString,"cgame","game)

1

There are 1 answers

1
MrFlick On BEST ANSWER

The tm_map function doesn't modify the corpus in place, it returns a modified corpus. Right now you are not doing anything to save the result. Try

 for(i in 1:length(replace)){
     corpus <- tm_map(corpus,toString,paste("c",replace[i],sep=""),replace[i])
  }

Also, toString is actually the name of a function in base R so it's not a good idea to write your own function with the same name.

Finally, the get() method returns the value of an R variable with the same name as the character value you pass it. There is no reason to use get() here since you want to continue working with strings and not variable names.