how to create the bigram matrix?

Question

how to create the bigram matrix?

2.2k views Asked by marysd At 08 June 2015 at 19:47

I want to make a matrix of the bigram model. How can I do it? Any suggestions which match my code, please?

 import nltk
 from collections import Counter


 import codecs
 with codecs.open("Pezeshki339.txt",'r','utf8') as file:
     for line in file:
       token=line.split()

 spl = 80*len(token)/100
 train = token[:int(spl)]
 test = token[int(spl):]
 print(len(test))
 print(len(train))
 cn=Counter(train)
 known_words=([word for word,v in cn.items() if v>1])# removes the rare  words and puts them in a list

 bigram=nltk.bigrams(known_words)
 frequency=nltk.FreqDist(bigram)
 for f in frequency:
       print(f,frequency[f])

I need something like:

          w1        w2      w3          ....wn
 w1     n(w1w1)  n(w1w2)  n(w1w3)      n(w1wn)
 w2     n(w2w1)  n(w2w1)  n(w2w3)      n(w2wn)
 w3   .
  .
  .
  .
  wn

The same for all rows and columns.

Original Q&A

There are 1 answers

**alexis** · Accepted Answer · 2015-06-09T10:59:19+00:00

Since you need a "matrix" of words, you'll use a dictionary-like class. You want a dictionary of all first words in bigrams. To make a two-dimensional matrix, it will be a dictionary of dictionaries: Each value is another dictionary, whose keys are the second words of the bigrams and values are whatever you're tracking (probably number of occurrences).

In the NLTK you can do it quickly with a ConditionalFreqDist():

mybigrams = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))

But I recommend you build your bigram table step by step. You'll understand it better, and you need to before you can use it.

TechQA.

how to create the bigram matrix?

There are 1 answers

Related Questions in PYTHON

Related Questions in NLP

Related Questions in NLTK

Related Questions in N-GRAM

Popular Questions

Popular Tags

Trending Questions