how handle categorical data in sklearn GMM mixture model

1.8k views Asked by At

This is a stupid question but is there a way to feed in categorical observations in sklearn GMM module ?

My data looks somewhat like:

User,Siet_category,user_segment

UserA,Sports:News,efk-457
UserB,Music:Entertainment,asl-567
UserC,Sports:News,asl-567
UserD,Sports:News,efk-457

user_segment is the class in my data set (there are about 10 classes). I see this to be a mixture of 10 different distributions.

What I want to do is give a test user and the site category I want to know which class / distribution that test case would belong to.

I know I can opt for a discriminative model but I want to see how a generative model does in this case.

1

There are 1 answers

0
Super_Cat On

The StepMix package follows the sklearn interface. Here's how you can fit categorical/multinoulli mixtures:

# Categorical StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="categorical", verbose=0, random_state=123)

# Fit model and predict clusters
model.fit(data_categorical)