I am curious as to why the following cdist
differ so much in time even though they produce the same results:
import numpy as np
from scipy.spatial.distance import cdist
x = np.random.rand(10_000_000, 50)
y = np.random.rand(50)
result_1 = cdist(x, y[np.newaxis, :])
result_2 = cdist(x, y[np.newaxis, :], `minkowski`, p=2.)
The result_1
is significantly faster than result_2
.
The C implementation of the Euclidean distance, source lines 50-66, uses multiplication and a
sqrt()
call while the Minkowski distance, source lines 381-391 is based on the much slower calls to thepow()
function.For reference, see discussion here and here comparing
pow
to multiplication andsqrt
.So despite the appearance that the Euclidean norm just calls the Minkowski norm, source line 614,
cdist
actually calls directly through to the C implementation where the code is different. The pythoneuclidean
function is not called in the actual execution.