Confused about X in GaussianHMM.fit([X])

1.9k views Asked by At

With this code:

X = numpy.array(range(0,5))
model = GaussianHMM(n_components=3,covariance_type='full', n_iter=1000)
model.fit([X])

I get

tuple index out of range 
self.n_features = obs[0].shape[1]

So what are you supposed to pass .fit() exactly? The hidden states AND emissions in a tuple? If so in what order? The documentation is less than helpful.

I noticed it likes being passed tuples as this does not give an error:

X = numpy.column_stack([range(0,5),range(0,5)])
model = GaussianHMM(n_components=3,covariance_type='full', n_iter=1000)
model.fit([X])

Edit:

Let me clarify a bit, the documentation indicates that the ordinality of the array must be:

List of array-like observation sequences (shape (n_i, n_features)).

This would almost indicate that you pass a tuple for each sample that indicates in a binary fashion which observations are present. However their example indicates otherwise:

# pack diff and volume for training
X = np.column_stack([diff, volume])

hence the confusion

2

There are 2 answers

0
Brooks On

It would appear the GaussianHMM function is for multivariate-emission-only HMM problems, hence the requirement to have >1 emission vectors. When the documentation refers to 'n_features' they are not referring to the number of ways emissions can express themselves but the number of orthogonal emission vectors.

Hence, "features" (the orthogonal emission vectors) are not to be confused with "symbols" which, in sklearn's parlance (which is likely shared with the greater hmm community for all I know), refer to what actual unique values the system is capable of emitting.

For univariate emission-vector problems, use MultinomialHMM.

Hope that clarifies for anyone else who want to use this stuff without becoming the world's foremost authority on HMMs :)

0
dixon1e On

I realize this is an old thread but the problem in the example code is still there. I believe the example is now at this link and still giving the same error:

tuple index out of range 
self.n_features = obs[0].shape[1]

The offending line of code is: model = GaussianHMM(n_components=5, covariance_type="diag", n_iter=1000).fit(X)

Which should be: model = GaussianHMM(n_components=5, covariance_type="diag", n_iter=1000).fit([X])