I'd like to combine gaussian mixture models (GMMs) without using the underlying input values for the models (because I don't want to save all the values).
I had two ideas:
- Get weights, means and covariances from both GMMs, combine them (and weight the weights according to the number of input data) and build the combined model based on these with the number of components as the sum of the number of components of the individual models.
Pseudo example for 2 GMMs:
data_1=[1,2,1,3,1,2,5,2,1,2,7,6]
data_2=[5,4,6,5,6,5,8,5,6,7,12,12,13,15,20]
data_1 = np.expand_dims(data_1, 1)
data_2 = np.expand_dims(data_2, 1)
n_comp_1=get_best_amount_Gaussians(data_1)
n_comp_2=get_best_amount_Gaussians(data_2)
gm1=GaussianMixture(n_components=n_comp_1)
gm1.fit(data_1)
gm2=GaussianMixture(n_components=n_comp_2)
gm2.fit(data_2)
n_comp_3=n_comp_1+n_comp_2
#get weights, means and covs
#weight the weights according to the number of input values:
gm1_weights=gm1_weights*len(data_1)
gm2_weights=gm2_weights*len(data_2)
gm3_weights=np.concatenate((gm1_weights,gm2_weights),axis=0)
#build gm3 based on these values:
GaussianMixture(n_components=n_comp_3)
loaded_gmm.weights_ = gm3_weights
loaded_gmm.means_ = gm3_means
loaded_gmm.covariances_ = gm3_cov
- Has the disadvantage that the emerging model might be to complex as components with same or similar means are not combined.
- Sample 1000 (or a large number) * number_of_input_values per Gaussian model and fit the new model to the sampled data.
lst_gm3=[]
len_1=int(1000*len(data_1))
for i in range(len_1):
lst_gm3.append(gm1.sample())
#get_best_amount_Gaussians(lst_gm3)
gm3=GaussianMixture(n_components=5)
gm3.fit(lst_gm3)
...
- Is computational demanding.
Is there another way to do so?
Thanks a lot in advance!