Sklearn GMM gives shifted gaussian peaks

1.5k views Asked by At

I'm fitting a mixture of two gaussians to 1D data (over 1000 points).

It seems that the peaks of the sum of two gaussians are shifted to the left relative to the peaks of the histogram. I assume this is due to my data having a cut-off at around 0.5.

Green and red lines are two best fitting gaussians, black is the sum of two. Here's the plot: Plot

Is there any way I can ensure that the peaks match, even though there is a lack of data points on the right?

I'm using :

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import mixture
    import scipy.stats as stats

    g = mixture.GaussianMixture(n_components=2,covariance_type='full')
    g.fit(data)
    weights = g.weights_
    means = g.means_
    covars = g.covariances_

    num_bins = 50
    n, bins, patches = plt.hist(data, num_bins, normed=True, facecolor='blue', alpha=0.2)
    plt.plot(x,weights[0]*stats.norm.pdf(x,means[0],np.sqrt(covars[0])), c='red')
    plt.plot(x,weights[1]*stats.norm.pdf(x,means[1],np.sqrt(covars[1])), c='green')
    plt.plot(x, weights[0]*stats.norm.pdf(x,means[0],np.sqrt(covars[0])) + weights[1]*stats.norm.pdf(x,means[1],np.sqrt(covars[1])), c = 'black')
1

There are 1 answers

0
lito On

You are simply adding the green gaussian to the total of the red one. Since there is a lot of overlap of the two gaussians, if you want the peaks to match, you'd have to not add the the green guassian to the red guassian as the red guassian is approaching its peak.