Saving ngram objects in a dataframe

1.1k views Asked by At
require(ngram)
require(stringr)
res<-readLines("text1.txt")
wordlength=0

for(j in 1:length(res) ){
temp<-sapply(strsplit(res[j]," "), length)
 if (temp>=wordlength) {
    wordlength=temp
 }
}

rm("temp")
data<-data.frame

for(i in 1:length(res) ){
x<-res[i]
ng<-ngram(x,n=2)
temp<-babble(ng,genlen=500,seed=123)
data[i]<-ngram(temp,n=2)
}

get.ngrams(ngram(bab[1,],n=2))
babng<-matrix(nrow=length(res),wordlength)

I'm trying to save my ngram data in a data frame from this loop, I've also tried to save it in a matrix but this error is showing:

"object of type 'closure' is not subsettable"

I want to get the frequency distribution of the every 2gram element from the babbler. Sorry for my messy coding. I'm new to R.

3

There are 3 answers

1
Jthorpe On

In your code, you call:

data<-data.frame

which assigns the function data.frame to the variable data. Later, you call data[i]<-ngram(temp,n=2) which is causing the error, because the function assigned to the variable data cannot be sub-setted using the subset operator [. You probably want to create a data.frame object and assign it to the variable data by calling the function data.frame via:

data<-data.frame()
0
user778806 On

2 years later but ....
Ignoring the specifics of your code probably due to the little familiarity with R you declare (and with Quanteda also I would guess)

d1 <- dfm("simple sample text", ngrams = 2)
d2 <- textstat_frequency(d1)
class(d2)
# [1] "frequency"  "textstat"   "data.frame"
d2
#         feature frequency rank docfreq group
# 1 simple_sample         1    1       1   all
# 2   sample_text         1    2       1   all

Unless there are specific reasons not to the text can be read in one shot, if there aren't special reasons not to, probably readtext, synergic with Quanteda, would be the best choice

1
Ken Benoit On

This will also do it quite easily:

require(quanteda)
Bigrams <- tokenize(toLower(res), ngrams = 2)
as.data.frame(table(Bigrams))

You can do this for more than one document, if res is a character vector of documents, using

BigramDfm <- dfm(res, ngrams = 2)
as.data.frame(BigramDfm)