I need to generate features from text. The script I am using below is available online but actually I do not know how to run it because I do not know python at all. I have a text file called (train.txt) contains the following
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
and I have a python script that convert the above text to features like the following features:
B-NP w[0]=He w[1]=reckons w[2]=the w[0]|w[1]=He|reckons pos[0]=P
RP pos[1]=VBZ pos[2]=DT pos[0]|pos[1]=PRP|VBZ pos[1]|pos[2]=VB
Z|DT pos[0]|pos[1]|pos[2]=PRP|VBZ|DT __BOS__
...
The python script is
# Separator of field values.
separator = ' '
# Field names of the input data.
fields = 'w pos y'
# Attribute templates.
templates = (
(('w', -2), ),
(('w', -1), ),
(('w', 0), ),
(('w', 1), ),
(('w', 2), ),
(('w', -1), ('w', 0)),
(('w', 0), ('w', 1)),
(('pos', -2), ),
(('pos', -1), ),
(('pos', 0), ),
(('pos', 1), ),
(('pos', 2), ),
(('pos', -2), ('pos', -1)),
(('pos', -1), ('pos', 0)),
(('pos', 0), ('pos', 1)),
(('pos', 1), ('pos', 2)),
(('pos', -2), ('pos', -1), ('pos', 0)),
(('pos', -1), ('pos', 0), ('pos', 1)),
(('pos', 0), ('pos', 1), ('pos', 2)),
)
import crfutils
def feature_extractor(X):
# Apply attribute templates to obtain features (in fact, attributes)
crfutils.apply_templates(X, templates)
if X:
# Append BOS and EOS features manually
X[0]['F'].append('__BOS__') # BOS feature
X[-1]['F'].append('__EOS__') # EOS feature
if __name__ == '__main__':
crfutils.main(feature_extractor, fields=fields, sep=separator)
Both script.py and crfutils.py exist in the same folder I run the above script from cmd on Windows 7 as follow:
C:\>Python script.py train.txt > train.result.txt
I've got an empty file titled train.result.txt and because I am new to python (actually just start learning it). I do not know what is the problem? am I providing the arguments in a wrong order? Is the format of the train.txt file wrong?
You need to pass in train.txt on stdin, not as a command line argument: