As far as I understood the Gaussian mixture model in scikit learn expects samples drawn from a distribution consisting of several Gaussians. E.g from astroML:
They generate samples using gmm:
from matplotlib import pyplot as plt
import numpy as np
from sklearn.mixture import GMM
np.random.seed(1)
gmm = GMM(3, n_iter=1)
gmm.means_ = np.array([[-1], [0], [3]])
gmm.covars_ = np.array([[1.5], [1], [0.5]]) ** 2
gmm.weights_ = np.array([0.3, 0.5, 0.2])
X = gmm.sample(1000)
Essentially, they fit a Gaussian mixture model to the data like (pseudocode):
GMM(number of peaks).fit(X)
X looks like:
The histogramm of X:
plt.hist(X, 30, normed=True, histtype='stepfilled', alpha=0.4)
My data looks differently. I do not have samples drawn from a distribution. The measuring device already provides me with a discrete probability distribution (particle size):
How do I convert the discrete pdf ?
I tried:
from scipy import stats
custm = stats.rv_discrete(name='custm', values=(x, y))
But custm.pmf(xk)
does not reproduce the distribution.
My data:
x = np.array([ 1.00074269e-02, 1.13692409e-02, 1.29163711e-02,
1.46740353e-02, 1.66708829e-02, 1.89394623e-02,
2.15167508e-02, 2.44447575e-02, 2.77712084e-02,
3.15503239e-02, 3.58437026e-02, 4.07213258e-02,
4.62626976e-02, 5.25581412e-02, 5.97102709e-02,
6.78356648e-02, 7.70667650e-02, 8.75540365e-02,
9.94684195e-02, 1.13004116e-01, 1.28381754e-01,
1.45851987e-01, 1.65699574e-01, 1.88248029e-01,
2.13864884e-01, 2.42967690e-01, 2.76030816e-01,
3.13593183e-01, 3.56267049e-01, 4.04747990e-01,
4.59826233e-01, 5.22399543e-01, 5.93487850e-01,
6.74249878e-01, 7.66002029e-01, 8.70239845e-01,
9.88662378e-01, 1.12319989e+00, 1.27604531e+00,
1.44968999e+00, 1.64696429e+00, 1.87108375e+00,
2.12570146e+00, 2.41496764e+00, 2.74359726e+00,
3.11694690e+00, 3.54110209e+00, 4.02297646e+00,
4.57042446e+00, 5.19236937e+00, 5.89894875e+00,
6.70167970e+00, 7.61364654e+00, 8.64971415e+00,
9.82677018e+00, 1.11640004e+01, 1.26832013e+01,
1.44091357e+01, 1.63699358e+01, 1.85975622e+01,
2.11283247e+01, 2.40034743e+01, 2.72698752e+01,
3.09807691e+01, 3.51966426e+01, 3.99862137e+01,
4.54275512e+01, 5.16093477e+01, 5.86323653e+01,
6.66110774e+01, 7.56755354e+01, 8.59734879e+01,
9.76727893e+01, 1.10964136e+02, 1.26064173e+02,
1.43219028e+02, 1.62708322e+02, 1.84849725e+02,
2.10004139e+02, 2.38581573e+02, 2.71047834e+02,
3.07932115e+02, 3.49835622e+02, 3.97441372e+02,
4.51525329e+02, 5.12969048e+02, 5.82774050e+02,
6.62078141e+02, 7.52173958e+02, 8.54530045e+02,
9.70814783e+02, 1.10292359e+03, 1.25300980e+03,
1.42351980e+03, 1.61723286e+03, 1.83730645e+03,
2.08732774e+03, 2.37137201e+03, 2.69406911e+03,
3.06067895e+03, 3.47717718e+03])
y = array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0.09, 0.18, 0.31,
0.48, 0.69, 0.94, 1.21, 1.49, 1.76, 2. , 2.2 , 2.36,
2.47, 2.56, 2.63, 2.69, 2.76, 2.84, 2.91, 2.99, 3.05,
3.1 , 3.13, 3.14, 3.11, 3.06, 2.96, 2.81, 2.63, 2.42,
2.21, 2.03, 1.94, 1.95, 2.09, 2.34, 2.66, 2.97, 3.18,
3.22, 3.04, 2.64, 2.07, 1.43, 0.83, 0.36, 0.09, 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. ])