Uni-modal vs Multi-modal models - how to pick best fit

28 views Asked by At

My goal is to create a model of a data distribution. I want to pick the best model however I am not sure how to compare the obtained models.

For uni-modal distribution I used the Fitter library. For multi-modal I tried Gaussian Mixture Models. Both models output aic/bic.

In Fitter the information criteria is computed here (see _fit_single_distribution() method) In GMM is here

In my case the values seem to have different orders of magnitudes:

  • uni-modal - 1500-1600
  • multi-modal (2 normal distributions) - 155000-157000

The original samples size used for fitting: ~16800 samples

Single Distribution Fit Model

Uni-modal fit

Two Distribution Fit Multi-modal fit

AIC/BIC values (multi-modal)

I've picked 2 components in my example. AIC/BIC multi-modal

1

There are 1 answers

0
Eugen On

One way to compare the models is to use log likelihood.

For the GMM model I used this function to compute LL: model.score(data) where model is the output of the fit() function.

For the unimodal distribution I used this function to compute LL:

def loglik_unimodal(data, dist):
    _, x = np.histogram(data, bins=100, density=True)
    x = [(xx + x[i + 1]) / 2.0 for i, xx in enumerate(x[0:-1])]
    return np.mean(dist.logpdf(x))

dist is the fitted distribution e.g. scipy.stats.lognorm(loc, scale)