I'm new to CRF++. I'm teaching myself looking at its manual: http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ
And I don't understand what this means:
This is a template to describe unigram features. When you give a
template "U01:%x[0,1]", CRF++ automatically generates a set of feature
functions (func1 ... funcN) like:
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0
funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template
amounts to (L * N), where L is the number of output
Why are there many lines for the Unigram features and what do they mean?
For a particular template %x[i,j], i represents the offsets(row) to current position, j represents the feature(column) you want to use. Given data:
%x[0,1] refers to the word, offset to current word is 0, its pos tag is JJ and its output tag is I-NP.
Move farword, %x[0, 1] -> pos tag = NN, output tag = I-NP
Each feature function refers to a pair of possible values of the current word and its pos tag.
update:
I think explaination above is quite straight forward on condition that you understand CRF model well.
CRF Model Reference
CRF++ is a replication of Sha and Pereira (2003)