Reading a lot of text files into R

102 views Asked by At

I have a lot of text files which represent messages. I want to analyze them in the tm package in R, so I need to get them into R. What is an efficient way to read all the words in the messages into R? Something like:

txts <- Sys.glob("*.txt")
for (f in txts) {
tempData <- as.data.frame(scan(f, what="raw", quiet = TRUE))
 data <- rbind(data, tempData)
 }

simply takes forever and doesn't work very well. How do I read all the complete words in all the files and get them into R quickly?

Bonus trickery: Some of the files seem to have been generated weirdly and now have some words on a new line like

  h
  e
  l
  l
  o

Is there a way to either ignore words that are really short (already when reading them into R) or to make R string them all together?

0

There are 0 answers