how can i convert list of sentences to IOB format, saving the sentences separation in the output

Question

how can i convert list of sentences to IOB format, saving the sentences separation in the output

1.9k views Asked by Khrystyna Kosenko At 27 December 2016 at 13:52

i have some txt file which i need to convert to IOB format for CRF model.

Using nltk tree2conlltags i can convert tokenized, postagged text into IOB format that i need.

Like this

("u'Is", 'JJ', u'O')
('Miami', 'NNP', u'B-PERSON')
('playing', 'NN', u'O')
('in', 'IN', u'O')
('Washigthon', 'NNP', u'B-GPE')
('this', 'DT', u'O')
('month', 'NN', u'O')
('?', '.', u'O')

But the problem is that as output i get one word as one element, but i need one sentence as element.

Also i tried firstly separate text into sentences and then tokenize them, so i'll save sentences boundAries, but nltk pos tagger doesn't accept list type data.

Maybe there is the whole new approach to get the format i need or

Original Q&A

There are 1 answers

**lenz** · Answer 1 · 2016-12-28T12:12:01+00:00

It's easy to concatenate the tokens, PoS tags and the NER labels into one string each for every sentence, eg. like this (token_wise is the data from your example):

>>> tuple(' '.join(layer) for layer in zip(*token_wise))
("u'Is Miami playing in Washigthon this month ?",
 'JJ NNP NN IN NNP DT NN .',
 'O B-PERSON O O B-GPE O O O')

You'd have to repeat that for each sentence. But it doesn't make any sense. Your CRF tagger will have no chance to predict a complex label like 'O B-PERSON O O B-GPE O O O', because you'll have a huge sparse-data problem. Most labels will only be seen once, and even more so the input sentences.

Also, this is not IOB format. In IOB, you have either I, O, or B per element, but not a combination of them.

TechQA.

how can i convert list of sentences to IOB format, saving the sentences separation in the output

There are 1 answers

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in POS-TAGGER

Related Questions in CRF

Popular Questions

Popular Tags

Trending Questions