Numeric conversion of textual features in crfsuite

Question

Numeric conversion of textual features in crfsuite

136 views Asked by Ishan Dixit At 23 May 2020 at 13:13

I was looking at the example code provided in the docs of crfsuite-python and it has the following code for feature defining.

def word2features(sent, i):
word = sent[i][0]
postag = sent[i][1]

features = [
    'bias',
    'word.lower=' + word.lower(),
    'word[-3:]=' + word[-3:],
    'word[-2:]=' + word[-2:],
    'word.isupper=%s' % word.isupper(),
    'word.istitle=%s' % word.istitle(),
    'word.isdigit=%s' % word.isdigit(),
    'postag=' + postag,
    'postag[:2]=' + postag[:2],
]
if i > 0:
    word1 = sent[i-1][0]
    postag1 = sent[i-1][1]
    features.extend([
        '-1:word.lower=' + word1.lower(),
        '-1:word.istitle=%s' % word1.istitle(),
        '-1:word.isupper=%s' % word1.isupper(),
        '-1:postag=' + postag1,
        '-1:postag[:2]=' + postag1[:2],
    ])
else:
    features.append('BOS')
    
if i < len(sent)-1:
    word1 = sent[i+1][0]
    postag1 = sent[i+1][1]
    features.extend([
        '+1:word.lower=' + word1.lower(),
        '+1:word.istitle=%s' % word1.istitle(),
        '+1:word.isupper=%s' % word1.isupper(),
        '+1:postag=' + postag1,
        '+1:postag[:2]=' + postag1[:2],
    ])
else:
    features.append('EOS')
            
return features

I understand that features such as isupper() can be either 0 or 1 but for features such as word[-2:] which are characters ,how are they converted to numeric terms?

Original Q&A

There are 1 answers

**Mandil Karki** · Accepted Answer · 2021-07-01T04:12:11+00:00

CRF trains upon sequence of input data to learn transitions from one state (label) to another. To enable such an algorithm, we need to define features which take into account different transitions. In the function word2features() below, we transform each word into a feature dictionary depicting the following attributes or features:

lower case of word
suffix containing last 3 characters
suffix containing last 2 characters
flags to determine upper-case, title-case, numeric data and POS tag

We also attach attributes related to previous and next words or tags to determine beginning of sentence (BOS) or end of sentence (EOS)

TechQA.

Numeric conversion of textual features in crfsuite

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in NLP

Related Questions in NAMED-ENTITY-RECOGNITION

Related Questions in PYTHON-CRFSUITE

Popular Questions

Popular Tags

Trending Questions