Why is my Python code 3x slower than my Matlab code? (matrix multiplication and inversion)

82 views Asked by At

My python code is a lot slower than my matlab code.

My python code is:

import time 
from scipy import linalg
import numpy as np

N=1521
dt=0.1
thet=0.5
A0 = (np.linspace(1,N,N)).reshape(N,1)
A0 = np.repeat(A0,N,axis=1)
A1 = (np.linspace(1,N,N)).reshape(N,1)
A1 = np.repeat(A1,N,axis=1)
A2 = (np.linspace(1,N,N)).reshape(N,1)
A2 = np.repeat(A2,N,axis=1)
U = (np.linspace(1,N,N)).reshape(N,1)

start=time.time()
for t in range(19):
    u=U
    Y0 = (np.eye(N) + dt*(A0+A1+A2)) @ u
    Y1 = linalg.inv(np.eye(N) -thet * dt*A1 ) @ (Y0 -thet *dt*A1 @ u) 
    Y2 = linalg.inv(np.eye(N) -thet * dt*A2 ) @ (Y1 -thet *dt*A2 @ u) 
    U=Y2
print(time.time() - start)

output: 12.160803079605103

Whilst my matlab code is:

clear all

N=1521;
dt=0.1;
thet=1;

A0=linspace(1,N,N)';
A0=repmat(A0,1,N);
A1=linspace(1,N,N)';
A1=repmat(A1,1,N);
A2=linspace(1,N,N)';
A2=repmat(A2,1,N);
U = linspace(1,N,N)';
I = eye(N);
tic;
for t=1:19
    u  = U;
    Y0 = (I + dt.*(A0+A1+A2))*u;
    Y1 = (I - thet.*dt.*A1) \ (Y0 - thet.*dt.*A1*u);
    Y2 = (I - thet.*dt.*A2) \ (Y1 - thet.*dt.*A2*u);
    U=Y2;
end
disp(toc)

output: 4.1449

I know matlab will be faster with inverting matrices, but not 3x faster than python. I also see that the matrix U for both is different but cannot spot the difference.

Is there something glaringly wrong in my python code?

1

There are 1 answers

2
Mechanic Pig On

The A \ b in Matlab should be semantically equivalent to solving a system of linear equations A * x = b, and should not calculate the inverse of A internally. A more appropriate approach would be to use numpy.linalg.solve or scipy.linalg.solve:

import numpy as np
from numpy import linalg

N=1521
dt=0.1
thet=0.5
A0 = (np.linspace(1,N,N)).reshape(N,1)
A0 = np.repeat(A0,N,axis=1)
A1 = (np.linspace(1,N,N)).reshape(N,1)
A1 = np.repeat(A1,N,axis=1)
A2 = (np.linspace(1,N,N)).reshape(N,1)
A2 = np.repeat(A2,N,axis=1)
U = (np.linspace(1,N,N)).reshape(N,1)
I = np.eye(N)

for t in range(19):
    u=U
    Y0 = (I + dt*(A0+A1+A2)) @ u
    Y1 = linalg.solve(I -thet * dt*A1, Y0 -thet *dt*A1 @ u)
    Y2 = linalg.solve(I -thet * dt*A2, Y1 -thet *dt*A2 @ u)
    U=Y2

The new code using np.linalg.solve has nearly 35% acceleration (this value is not very stable because both methods have high CPU usage and can fluctuate greatly in runtime) on my machine:

In [_]: %%timeit
   ...: # loop part of old code
   ...: # ...
9.08 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [_]: %%timeit
   ...: # loop part of new code
   ...: # ...
5.89 s ± 219 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)