Peak fitting with gaussian mixure model (Scikit); how to sample from a discrete pdf?

630 views Asked by At

As far as I understood the Gaussian mixture model in scikit learn expects samples drawn from a distribution consisting of several Gaussians. E.g from astroML:

They generate samples using gmm:

from matplotlib import pyplot as plt
import numpy as np
from sklearn.mixture import GMM

np.random.seed(1)

gmm = GMM(3, n_iter=1)
gmm.means_ = np.array([[-1], [0], [3]])
gmm.covars_ = np.array([[1.5], [1], [0.5]]) ** 2
gmm.weights_ = np.array([0.3, 0.5, 0.2])

X = gmm.sample(1000)

Essentially, they fit a Gaussian mixture model to the data like (pseudocode):

 GMM(number of peaks).fit(X)

X looks like:

Randomly drawn samples from gmm

The histogramm of X:

plt.hist(X, 30, normed=True, histtype='stepfilled', alpha=0.4)

Histogram of X

My data looks differently. I do not have samples drawn from a distribution. The measuring device already provides me with a discrete probability distribution (particle size):

Particle size distribution

How do I convert the discrete pdf ?

I tried:

from scipy import stats
custm = stats.rv_discrete(name='custm', values=(x, y))

But custm.pmf(xk) does not reproduce the distribution.

My data:

x = np.array([  1.00074269e-02,   1.13692409e-02,   1.29163711e-02,
         1.46740353e-02,   1.66708829e-02,   1.89394623e-02,
         2.15167508e-02,   2.44447575e-02,   2.77712084e-02,
         3.15503239e-02,   3.58437026e-02,   4.07213258e-02,
         4.62626976e-02,   5.25581412e-02,   5.97102709e-02,
         6.78356648e-02,   7.70667650e-02,   8.75540365e-02,
         9.94684195e-02,   1.13004116e-01,   1.28381754e-01,
         1.45851987e-01,   1.65699574e-01,   1.88248029e-01,
         2.13864884e-01,   2.42967690e-01,   2.76030816e-01,
         3.13593183e-01,   3.56267049e-01,   4.04747990e-01,
         4.59826233e-01,   5.22399543e-01,   5.93487850e-01,
         6.74249878e-01,   7.66002029e-01,   8.70239845e-01,
         9.88662378e-01,   1.12319989e+00,   1.27604531e+00,
         1.44968999e+00,   1.64696429e+00,   1.87108375e+00,
         2.12570146e+00,   2.41496764e+00,   2.74359726e+00,
         3.11694690e+00,   3.54110209e+00,   4.02297646e+00,
         4.57042446e+00,   5.19236937e+00,   5.89894875e+00,
         6.70167970e+00,   7.61364654e+00,   8.64971415e+00,
         9.82677018e+00,   1.11640004e+01,   1.26832013e+01,
         1.44091357e+01,   1.63699358e+01,   1.85975622e+01,
         2.11283247e+01,   2.40034743e+01,   2.72698752e+01,
         3.09807691e+01,   3.51966426e+01,   3.99862137e+01,
         4.54275512e+01,   5.16093477e+01,   5.86323653e+01,
         6.66110774e+01,   7.56755354e+01,   8.59734879e+01,
         9.76727893e+01,   1.10964136e+02,   1.26064173e+02,
         1.43219028e+02,   1.62708322e+02,   1.84849725e+02,
         2.10004139e+02,   2.38581573e+02,   2.71047834e+02,
         3.07932115e+02,   3.49835622e+02,   3.97441372e+02,
         4.51525329e+02,   5.12969048e+02,   5.82774050e+02,
         6.62078141e+02,   7.52173958e+02,   8.54530045e+02,
         9.70814783e+02,   1.10292359e+03,   1.25300980e+03,
         1.42351980e+03,   1.61723286e+03,   1.83730645e+03,
         2.08732774e+03,   2.37137201e+03,   2.69406911e+03,
         3.06067895e+03,   3.47717718e+03])

y = array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.09,  0.18,  0.31,
        0.48,  0.69,  0.94,  1.21,  1.49,  1.76,  2.  ,  2.2 ,  2.36,
        2.47,  2.56,  2.63,  2.69,  2.76,  2.84,  2.91,  2.99,  3.05,
        3.1 ,  3.13,  3.14,  3.11,  3.06,  2.96,  2.81,  2.63,  2.42,
        2.21,  2.03,  1.94,  1.95,  2.09,  2.34,  2.66,  2.97,  3.18,
        3.22,  3.04,  2.64,  2.07,  1.43,  0.83,  0.36,  0.09,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ])
0

There are 0 answers