Python: Calculate likelihood of a value for N number of multivariate normal distributions

2.7k views Asked by At

So I have a a set of N multivariate normal distributions, which all have the same covariance. For each of these distributions, I want to calculate the likelihood of getting the value x.

For a single distribution, and multiple number of "x" values, this is trivial

from scipy.stats import multivariate_normal
import numpy as np

cov = [[1 ,0.1],[0.1 ,1]]
mean = [0,0]
Values = np.random.multivariate_normal([0,0],cov,samp)
print  multivariate_normal.pdf(Values, mean, cov)

Now, if we reverse this, and assume we only have one Value to check for, but multiple means but same covariance each time. Like the following (Of course in the real case, the mean is different in each iteration)

means = [mean]*samples
Value = Values[0,:]

L = []
for iMean in means:
    L.append(multivariate_normal.pdf(Value, iMean, cov))

print L

Is there a better way of doing this? If there is any difference, then assuming that the covariance matrix is uncorrelated is also allowed, although a general solution is preferable.

1

There are 1 answers

0
A. L. On BEST ANSWER

You can first calculate the squared mahalanobis distance for all distributions. https://en.wikipedia.org/wiki/Mahalanobis_distance

Afterwards you calculate the probablilty density.

*https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.multivariate_normal.html *https://en.wikipedia.org/wiki/Multivariate_normal_distribution

By using numpy arrays you can avoid slow python loops. I added this to your example:

from scipy.stats import multivariate_normal
import numpy as np

cov = [[1 ,0.5],[0.5 ,1]]
mean = [2,2]

samples = 10
means = [mean]*samples

Value = (3,2.5)

L = []
for iMean in means:
    L.append(multivariate_normal.pdf(Value, iMean, cov))



mean_array = np.array(means)
value_array = np.array(Value).astype(np.float)
cov_array = np.array(cov)
inv_cov_array = np.linalg.inv(cov_array)
dim = cov_array.shape[0]

diffs = value_array-mean_array
maha_distances = np.sum(diffs.transpose()*np.dot(inv_cov_array,diffs.transpose()),axis=0)    
denominator = 1/np.sqrt((2*np.pi)**dim*np.linalg.det(cov_array))

l = denominator * np.exp(-0.5*maha_distances)

res_dif = np.array(L) - l
print res_dif