GMM - loglikelihood isn't monotonic

1.3k views Asked by At

Yesterday I implemented a GMM (Gaussian Mixture Model) using expectation-maximization algorithm.

As you remember, it models some uknown distribution as a mixture of gaussians which we need to learn its means and variances, and also the weights for each gaussian.

this is the mathematics behind the code (its not that complicated) http://mccormickml.com/2014/08/04/gaussian-mixture-models-tutorial-and-matlab-code/

this is my code:

import numpy as np
from scipy.stats import multivariate_normal
import matplotlib.pyplot as plt

#reference for this code is http://mccormickml.com/2014/08/04/gaussian-mixture-models-tutorial-and-matlab-code/

def expectation(data, means, covs, priors): #E-step. returns the updated probabilities
    m = data.shape[0]                       #gets the data, means covariances and priors of all clusters
    numOfClusters = priors.shape[0]

    probabilities = np.zeros((m, numOfClusters))
    for i in range(0, m):
        for j in range(0, numOfClusters):
            sum = 0
            for l in range(0, numOfClusters):
                sum += normalPDF(data[i, :], means[l], covs[l]) * priors[l, 0]
            probabilities[i, j] = normalPDF(data[i, :], means[j], covs[j]) * priors[j, 0] / sum

    return probabilities

def maximization(data, probabilities): #M-step. this updates the means, covariances, and priors of all clusters
    m, n = data.shape
    numOfClusters = probabilities.shape[1]

    means = np.zeros((numOfClusters, n))
    covs = np.zeros((numOfClusters, n, n))
    priors = np.zeros((numOfClusters, 1))

    for i in range(0, numOfClusters):
        priors[i, 0] = np.sum(probabilities[:, i]) / m #update priors

        for j in range(0, m): #update means
            means[i] += probabilities[j, i] * data[j, :]

            vec = np.reshape(data[j, :] - means[i, :], (n, 1))
            covs[i] += probabilities[j, i] * np.dot(vec, vec.T) #update covs

        means[i] /= np.sum(probabilities[:, i])
        covs[i] /= np.sum(probabilities[:, i])

    return [means, covs, priors]

def normalPDF(x, mean, covariance): #this is simply multivariate normal pdf
    n = len(x)

    mean = np.reshape(mean, (n, ))
    x = np.reshape(x, (n, ))

    var = multivariate_normal(mean=mean, cov=covariance,)
    return var.pdf(x)


def initClusters(numOfClusters, data): #initialize all the gaussian clusters (means, covariances, priors
    m, n = data.shape

    means = np.zeros((numOfClusters, n))
    covs = np.zeros((numOfClusters, n, n))
    priors = np.zeros((numOfClusters, 1))

    initialCovariance = np.cov(data.T)

    for i in range(0, numOfClusters):
        means[i] = np.random.rand(n) #the initial mean for each gaussian is chosen randomly
        covs[i] = initialCovariance #the initial covariance of each cluster is the covariance of the data
        priors[i, 0] = 1.0 / numOfClusters #the initial priors are uniformly distributed.

    return [means, covs, priors]

def logLikelihood(data, probabilities): #data is our data. probabilities[i, j] = k means probability example i belongs in cluster j is 0 < k < 1
    m = data.shape[0] #num of examples

    examplesByCluster = np.zeros((m, 1))
    for i in range(0, m):
        examplesByCluster[i, 0] = np.argmax(probabilities[i, :])
    examplesByCluster = examplesByCluster.astype(int) #examplesByCluster[i] = j means that example i belongs in cluster j

    result = 0
    for i in range(0, m):
        result += np.log(probabilities[i, examplesByCluster[i, 0]]) #example i belongs in cluster examplesByCluster[i, 0]

    return result

m = 2000 #num of training examples
n = 8 #num of features for each example

data = np.random.rand(m, n)
numOfClusters = 2 #num of gaussians
numIter = 30 #num of iterations of EM
cost = np.zeros((numIter, 1))

[means, covs, priors] = initClusters(numOfClusters, data)

for i in range(0, numIter):
    probabilities = expectation(data, means, covs, priors)
    [means, covs, priors] = maximization(data, probabilities)

    cost[i, 0] = logLikelihood(data, probabilities)

plt.plot(cost)
plt.show()

the problem is that the loglikelihood is behaving strange. I expect it to be monotonic increasing. But it's not.

For example, with 2000 examples of 8 features with 3 gaussian clusters, the loglikelihood looks like this (30 iterations) -

enter image description here

So this is very bad. But on other tests I ran, for example one test with 15 examples of 2 features and 2 clusters, the loglikelihood is this -

enter image description here

Better but still not perfect.

Why does this happen and how can I fix it?

1

There are 1 answers

0
etov On BEST ANSWER

The problem lies in the maximization step.

The code uses means for calculating covs. However this is done in the same loop, before dividing means by the sum of probabilities.

This causes the estimated covariances to explode.

Here's a suggested fix:

def maximization(data, probabilities): #M-step. this updates the means, covariances, and priors of all clusters
    m, n = data.shape
    numOfClusters = probabilities.shape[1]

    means = np.zeros((numOfClusters, n))
    covs = np.zeros((numOfClusters, n, n))
    priors = np.zeros((numOfClusters, 1))

    for i in range(0, numOfClusters):
        priors[i, 0] = np.sum(probabilities[:, i]) / m   #update priors

        for j in range(0, m): #update means
            means[i] += probabilities[j, i] * data[j, :]

        means[i] /= np.sum(probabilities[:, i])

    for i in range(0, numOfClusters):
        for j in range(0, m): #update means
            vec = np.reshape(data[j, :] - means[i, :], (n, 1))
            covs[i] += probabilities[j, i] * np.multiply(vec, vec.T) #update covs

        covs[i] /= np.sum(probabilities[:, i])

    return [means, covs, priors]

And the resulting cost function (200 data points, 4 features): Cost function

EDIT: I was convinced this bug was the only issue in the code, however running a few additional examples, I still sometimes see non-monotonic behaviour (although less erratic than before). So this seems to be just part of the problem.

EDIT2: There was another issue in covariance calculation: the vector multiplication should be element-wise, not dot-product - keep in mind that the result should be a vector. Results seem to be consistently-monotonically-increasing now.