I have multiple text files that need to be tokenised, POS and NER. I am using C&C taggers and have run their tutorial, but I am wondering if there is a way to tag multiple files rather than one by one.
At the moment I am tokenising the files:
bin/tokkie --input working/tutorial/example.txt--quotes delete --output working/tutorial/example.tok
as follows and then Part of Speech tagging:
bin/pos --input working/tutorial/example.tok --model models/pos --output working/tutorial/example.pos
and lastly Named Entity Recognition:
bin/ner --input working/tutorial/example.pos --model models/ner --output working/tutorial/example.ner
I am not sure how I would go about creating a loop to do this and keep the file name the same as the input but with the extension representing the tagging it has. I was thinking of a bash script or perhaps Perl to open the directory but I am not sure on how to enter the C&C commands in order for the script to understand.
At the moment I am doing it manually and it's pretty time consuming to say the least!
Untested, likely needs some directory mangling.