I'm having troubles when fitting a pdf to an histogram in Matlab. I'm using gmdistribution.fit because my data is multi-modal. This is what I have done:
data=[0.35*randn(1,100000), 0.5*randn(1,100000)+5, 1*randn(1,100000)+3]'; %multimodal data
x=min(data):(max(data)-min(data))/10000:max(data);
%Normalized Histogram
[counts,edges]=histcounts(data,500, 'Normalization', 'pdf');
bw=edges(2)-edges(1);
centers=edges(1:end-1)+bw;
H = bar(centers,counts,'hist');
hold on
%Fitting with gmdistribution
rng default
obj=gmdistribution.fit(data,3,'Replicates',5);
%the PDF
PDF=zeros(1,length(x));
for i=1:obj.NumComponents
k=obj.ComponentProportion(i);
u=obj.mu(i);
sigma=obj.Sigma(i);
PDF=PDF+k*normpdf(x,u,sigma);
end
PDF=PDF/trapz(x,PDF); %normalization (just in case)
plot(x,PDF)
%Fitting with ksdensity (for comparison)
[PDF2,xi]=ksdensity(data,x);
plot(x,PDF2)
legend('Normalized Histogram','gmdistribution','ksdensity')
As you can see, the Gaussian Mixture doesn't fit the histogram properly. The PDF from the ksdensiti function is much better. I have also tried to fit just one gaussian. If you run the same previous code, using data=[0.35*randn(1,100000)]'; and obj=gmdistribution.fit(data,1,'Replicates',5); you get the following
Histogram and PDFs for one gaussian
Again, the pdf from gmdistribution doesn't fit the histogram. It seems that the problem is with the scaling factor in the data generation (the 0.35). What am I doing wrong?
The
Sigma
parameter of thegmdistribution
object corresponds to the covariance, however, thenormpdf
function needs the standard deviation. The problem is fixed by replacingnormpdf(x,u,sigma)
withnormpdf(x,u,sqrt(sigma))
in the for loop.