# How can I speed up the performance by using numpy einsum and numexpr in calculating kernel functions?

I am trying to define a few of famous kernels like RBF, hyperbolic tangent, Fourier and etc for `svm.SVR` method in `sklearn` library. I started working on `rbf` (I know there's a default kernel in svm for rbf but I need to define one to be able to customize it later), and found some useful link in here and chose this one:

``````def my_kernel(X,Y):
K = np.zeros((X.shape,Y.shape))
for i,x in enumerate(X):
for j,y in enumerate(Y):
K[i,j] = np.exp(-1*np.linalg.norm(x-y)**2)
return K

clf=SVR(kernel=my_kernel)
``````

I used this one because I could use it for my train (with shape of [3850,4]) and test data (with shape of [1200,4]) which have different shapes. But the problem is that it's too slow and I have to wait so long for the results. I even used static-typing and memoryviews in cython, but its performance is not as good as the default `svm` rbf kernel. I also found this link which is about the same problem but working with `numpy.einsum` and `numexpr.evaluate` is a little bit confusing for me. It turns out that this was the best code in terms of speed performance:

from scipy.linalg.blas import sgemm

``````def app2(X, gamma, var):
X_norm = -gamma*np.einsum('ij,ij->i',X,X)
return ne.evaluate('v * exp(A + B + C)', {\
'A' : X_norm[:,None],\
'B' : X_norm[None,:],\
'C' : sgemm(alpha=2.0*gamma, a=X, b=X, trans_b=True),\
'g' : gamma,\
'v' : var\
})
``````

This code just works for one input (X) and I couldn't find a way to modify it for my case (two inputs with two different sizes - The kernel function gets matrices with shape (m,n) and (l,n) and outputs (m,l) according to svm docs ). I guess I only need to replace the `K[i,j] = np.exp(-1*np.linalg.norm(x-y)**2)` from the first code in the second one to speed it up. Any helps would be appreciated. On Best Solutions

## Three possible variants

Variants 1 and 3 makes use of

(a-b)^2 = a^2 + b^2 - 2ab

as described here or here. But for special cases like a small second dimension Variant 2 is also OK.

``````import numpy as np
import numba as nb
import numexpr as ne
from scipy.linalg.blas import sgemm

def vers_1(X,Y, gamma, var):
X_norm = -gamma*np.einsum('ij,ij->i',X,X)
Y_norm = -gamma*np.einsum('ij,ij->i',Y,Y)
return ne.evaluate('v * exp(A + B + C)', {\
'A' : X_norm[:,None],\
'B' : Y_norm[None,:],\
'C' : sgemm(alpha=2.0*gamma, a=X, b=Y, trans_b=True),\
'g' : gamma,\
'v' : var\
})

#Maybe easier to read but slow, if X.shape gets bigger
@nb.njit(fastmath=True,parallel=True)
def vers_2(X,Y):
K = np.empty((X.shape,Y.shape),dtype=X.dtype)
for i in nb.prange(X.shape):
for j in range(Y.shape):
sum=0.
for k in range(X.shape):
sum+=(X[i,k]-Y[j,k])**2
K[i,j] = np.exp(-1*sum)
return K

@nb.njit(fastmath=True,parallel=True)
def vers_3(A,B):
dist=np.dot(A,B.T)

TMP_A=np.empty(A.shape,dtype=A.dtype)
for i in nb.prange(A.shape):
sum=0.
for j in range(A.shape):
sum+=A[i,j]**2
TMP_A[i]=sum

TMP_B=np.empty(B.shape,dtype=A.dtype)
for i in nb.prange(B.shape):
sum=0.
for j in range(B.shape):
sum+=B[i,j]**2
TMP_B[i]=sum

for i in nb.prange(A.shape):
for j in range(B.shape):
dist[i,j]=np.exp((-2.*dist[i,j]+TMP_A[i]+TMP_B[j])*-1)
return dist
``````

Timings

``````gamma = 1.
var = 1.
X=np.random.rand(3850,4)
Y=np.random.rand(1200,4)

res_1=vers_1(X,Y, gamma, var)
res_2=vers_2(X,Y)
res_3=vers_3(X,Y)
np.allclose(res_1,res_2)
np.allclose(res_1,res_3)

%timeit res_1=vers_1(X,Y, gamma, var)
19.8 ms ± 615 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit res_2=vers_2(X,Y)
16.1 ms ± 938 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit res_3=vers_3(X,Y)
13.5 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
`````` On

You can just pipe your original kernel through pythran

kernel.py:

``````import numpy as np

#pythran export my_kernel(float64[:,:], float64[:,:])
def my_kernel(X,Y):
K = np.zeros((X.shape,Y.shape))
for i,x in enumerate(X):
for j,y in enumerate(Y):
K[i,j] = np.exp(-1*np.linalg.norm(x-y)**2)
return K
``````

Compilation step:

``````> pythran kernel.py
``````

There's no rewriting step (you need to put the kernel in a separate file though) and the acceleration is significant: 19 times faster on my laptop, using

``````> python -m timeit -s 'from numpy.random import random; x = random((100,100)); y = random((100,100)); from svr_kernel import my_kernel as k' 'k(x,y)'
``````

to gather timings.