I have two 2D points sets A
and B
. I want to find the first nearest neighbor in A
for each point in B
.
However, I am dealing with uncertain points (i.e. a point has a mean (2D vector) and a 2*2 covariance matrix).
I thus would like to use the Mahalanobis distance, but in scikit-learn
(for example), I cannot pass a covariance matrix for each point, as it expects a single covariance matrix.
Currently, considering only the average locations (i.e. mean of my 2D normal distribution), I have:
nearest_neighbors = NearestNeighbors(n_neighbors=1, metric='l2').fit(A)
distance, indices = nearest_neighbors.kneighbors(B)
With my uncertain points, instead of using the L2 norm as a distance, I would rather compute (between a point a
in A
and a point b
in B, their Mahalanobis distance:
d(a, b) = sqrt( transpose(mu_a-mu_b) * C * (mu_a-mu_b))
where C = inv(cov_a + cov_b)
where mu_a
(resp mu_b
) and cov_a
(resp. cov_b
) are the 2D mean and 2*2 covariance matrix of uncertain point a
(resp. b
).
I ended up using a custom distance:
Thus a point has 4 features:
x
andy
coordinatesx
andy
variances (covariance matrix is diagonal in my case)