NLTK saving trained Brill's model

Question

NLTK saving trained Brill's model

744 views Asked by humble_fool At 09 December 2024 at 00:13

I am training a Brill's POS tagger using the py-crfsuite as provided in NLTK. However when I try to save a trained model, I get the following error:

crf_tagger = CRFTagger()    
crf_tagger.train(train_sents, 'model_trained.crf.tagger')
templates = nltk.tag.brill.nltkdemo18()
trainer = nltk.tag.brill_trainer.BrillTaggerTrainer(crf_tagger, templates)
bt = trainer.train(train_sents, max_rules=10)

file_writing = file('trained_brill_tagger.yaml', 'w')
yaml.dump(bt, file_writing)

#even pickle fails
file_w = open('trained_brills.pickle', 'wb')
pickle.dump(bt, file_w)
file_w.close()

File "stringsource", line 2, in pycrfsuite._pycrfsuite.Tagger.reduce_cython TypeError: self.c_tagger cannot be converted to a Python object for pickling

I have tried using pickle, dill and also yaml however the error seems to persist. Is there any solution to this. Is this because of using CRF tagger as baseline? Thank you.

Original Q&A

There are 2 answers

**alvas** · Answer 1 · 2018-02-12T10:33:00+00:00

Here's an example of how you can train a nltk.tag.brill_trainer.BrillTaggerTrainer in NLTK v3.2.5

from nltk.corpus import treebank

from nltk.tag import BrillTaggerTrainer, RegexpTagger, UnigramTagger
from nltk.tbl.demo import REGEXP_TAGGER, _demo_prepare_data, _demo_prepare_data
from nltk.tag.brill import describe_template_sets, brill24

baseline_backoff_tagger = REGEXP_TAGGER
templates = brill24()
tagged_data = treebank.tagged_sents()
train=0.8
trace=3
num_sents=1000
randomize=False
separate_baseline_data=False

(training_data, baseline_data, gold_data, testing_data) = \
   _demo_prepare_data(tagged_data, train, num_sents, randomize, separate_baseline_data)

baseline_tagger = UnigramTagger(baseline_data, backoff=baseline_backoff_tagger)

# creating a Brill tagger
trainer = BrillTaggerTrainer(baseline_tagger, templates, trace, ruleformat="str")

Then to save the trainer, simply pickle:

import pickle
with open('brill-demo.pkl', 'wb') as fout:
    pickle.dump(trainer, fout)

**humble_fool** · Answer 2 · 2018-02-12T10:41:38+00:00

I realized the issue is in the CRFTagger module. If I use a different initial tagger with Brill's, the error isn't produced and model gets saved.

trainer = nltk.tag.brill_trainer.BrillTaggerTrainer(baseline_tagger, templates)

I was unable to save the trained model when baseline_tagger was a CRFTagger() object. Using something like an NgramTagger solves the issue for some reason.

TechQA.

NLTK saving trained Brill's model

There are 2 answers

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in POS-TAGGER

Related Questions in NLTK-TRAINER

Related Questions in BRILL-TAGGER

Popular Questions

Popular Tags

Trending Questions