Kernel for classification of variable length sequences of factors in kernlab

261 views Asked by At

Which is the best approach to define a suitable kernel for classification of variable length sequences of factors. I'm using kernlab with R.

Thanks!

2

There are 2 answers

0
lejlot On

There is no general good way. Variable length factors mean, that there is no dimension-dimension relation, so the suitable kernel function is fully data (problem) dependent.

However, the most basic approach, assuming, that your factors are just elements of some big set is to use Jaccard-based kernel,

K(A,B) = |A n B|

Which simply measures size of the intersection. It is easy to prove, that it is a valid kernel, as one can think about kernel projection phi(A) which encodes the set A as the bit-vector with "1" on the i'th dimension iff i'th element of the Universe (from which A is sampled) is contained in A. K defines a regular scalar product of such elements.

0
Eric On

You should read about:

  • Dynamic Time Warping (DTW) inspired kernels (with PDS constraints, such as global alignment kernels).

  • String kernels usually used for ADN-structure analysis (see spectrum kernel, mismatch kernel, ...).