Calculate readability scores for several files with R

1.6k views Asked by At

I would like to calculate the readability scores in R-3.3.2(R-Studio 3.4 for Win) using koRpus package for several txt.files and save results to excel or sqllite3 or txt. Now I can only calculate the readability score for one file only and print them to console. I tried to improve the code using loop over directory but it fails to work correctly.

library(koRpus)
library(tm)

#Loop through files
path = "D://Reports"
out.file<-""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
  file <- read.table(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}

#Only one file
report <- tokenize(txt =file , format = "file", lang = "en")

#SMOG-Index
results_smog <- SMOG(report)
summary(results_smog)

#Flesch/Kincaid-Index
results_fleshkin <- flesch.kincaid(report)
summary(results_fleshkin)

#FOG-Index
results_fog<- FOG(report)
summary(results_fog)
1

There are 1 answers

0
Curious George On BEST ANSWER

I ran to this same problem. I was looking through stackoverflow for a solution and saw your post. After some trial and error, I came up with the following code. Worked fine for me. I pulled out all the extra info. To find the index values of the scores i was looking for, i first ran it for one file and pulled the summary of the readability wrapper. It'll give you a table of a bunch of different values. Match the column with the row and you get the specific number to look for. There are lots of different options.

In the path directory, your files should be independent text files.

#Path
path="C:\\Users\\Philipp\\SkyDrive\\Documents\\Thesiswork\\ReadStats\\"

#list text files 
ll.files <- list.files(path = path, pattern = "txt",  full.names = TRUE);length(ll.files)

#set vectors
SMOG.score.vec=rep(0.,length(ll.files))
FleshKincaid.score.vec=rep(0.,length(ll.files))
FOG.score.vec=rep(0.,length(ll.files))

#loop through each file
for (i in 1:length(ll.files)){
  #tokenize
  tagged.text <- koRpus::tokenize(ll.files[i], lang="en")
  #hyphen the word for some of the packages that require it
  hyph.txt.en <- koRpus::hyphen(tagged.text)
  #Readability wrapper
  readbl.txt <- koRpus::readability(tagged.text, hyphen=hyph.txt.en, index="all")
  #Pull scores, convert to numeric, and update the vectors
  SMOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[36]) #SMOG Score
  FleshKincaid.score.vec[i]=as.numeric(summary(readbl.txt)$raw[11]) #Flesch Reading Ease Score 
  FOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[22]) #FOG score
  if (i%%10==0)
    cat("finished",i,"\n")}

#if you wanted to do just one
df=cbind(FOG.score.vec,FleshKincaid.score.vec,SMOG.score.vec)
colnames(df)=c("FOG", "Flesch Kincaid", "SMOG")
write.csv(df,file=paste0(path,"Combo.csv"),row.names=FALSE,col.names=TRUE)

# if you wanted to write seperate csvs
write.csv(SMOG.score.vec,file=paste0(path,"SMOG.csv"),row.names=FALSE,col.names = "SMOG")
write.csv(FOG.score.vec,file=paste0(path,"FOG.csv"),row.names=FALSE,col.names = "FOG")
write.csv(FleshKincaid.score.vec,file=paste0(path,"FK.csv"),row.names=FALSE,col.names = "Flesch Kincaid")