Saving ngram objects in a dataframe

Question

Saving ngram objects in a dataframe

1.1k views Asked by Subhayan Chattopadhyay At 16 June 2015 at 04:42

require(ngram)
require(stringr)
res<-readLines("text1.txt")
wordlength=0

for(j in 1:length(res) ){
temp<-sapply(strsplit(res[j]," "), length)
 if (temp>=wordlength) {
    wordlength=temp
 }
}

rm("temp")
data<-data.frame

for(i in 1:length(res) ){
x<-res[i]
ng<-ngram(x,n=2)
temp<-babble(ng,genlen=500,seed=123)
data[i]<-ngram(temp,n=2)
}

get.ngrams(ngram(bab[1,],n=2))
babng<-matrix(nrow=length(res),wordlength)

I'm trying to save my ngram data in a data frame from this loop, I've also tried to save it in a matrix but this error is showing:

"object of type 'closure' is not subsettable"

I want to get the frequency distribution of the every 2gram element from the babbler. Sorry for my messy coding. I'm new to R.

Original Q&A

There are 3 answers

**Jthorpe** · Answer 1 · 2015-06-16T05:48:48+00:00

In your code, you call:

data<-data.frame

which assigns the function data.frame to the variable data. Later, you call data[i]<-ngram(temp,n=2) which is causing the error, because the function assigned to the variable data cannot be sub-setted using the subset operator [. You probably want to create a data.frame object and assign it to the variable data by calling the function data.frame via:

data<-data.frame()

**user778806** · Answer 2 · 2018-08-09T13:42:52+00:00

2 years later but ....
Ignoring the specifics of your code probably due to the little familiarity with R you declare (and with Quanteda also I would guess)

d1 <- dfm("simple sample text", ngrams = 2)
d2 <- textstat_frequency(d1)
class(d2)
# [1] "frequency"  "textstat"   "data.frame"
d2
#         feature frequency rank docfreq group
# 1 simple_sample         1    1       1   all
# 2   sample_text         1    2       1   all

Unless there are specific reasons not to the text can be read in one shot, if there aren't special reasons not to, probably readtext, synergic with Quanteda, would be the best choice

**Ken Benoit** · Answer 3 · 2015-07-21T05:25:48+00:00

This will also do it quite easily:

require(quanteda)
Bigrams <- tokenize(toLower(res), ngrams = 2)
as.data.frame(table(Bigrams))

You can do this for more than one document, if res is a character vector of documents, using

BigramDfm <- dfm(res, ngrams = 2)
as.data.frame(BigramDfm)

TechQA.

Saving ngram objects in a dataframe

There are 3 answers

Related Questions in R

Related Questions in N-GRAM

Popular Questions

Popular Tags

Trending Questions