So the case is the following: I wanted to compare the runtime for a matrix multiplication with ipython parallel and just running on a single core.
Code for normal execution:
import numpy as np
n = 13
dim_1, dim_2, dim_3, dim_4 = 2**n, 2**n, 2**n, 2**n
A = np.random.random((dim_1, dim_2))
B = np.random.random((dim_3, dim_4))
start = timeit.time.time()
C = np.matmul(A,B)
dur = timeit.time.time() - start
well this amounts to about 24 seconds on my notebook If I do the same thing trying to parallize it. I start four engines using: ipcluster start -n 4 (I have 4 cores). Then I run in my notebook:
from ipyparallel import Client
c = Client()
dview = c.load_balanced_view()
%px import numpy
def pdot(view_obj, A_mat, B_mat):
view_obj['B'] = B
view_obj.scatter('A', A)
view_obj.execute('C=A.dot(B)')
return view_obj.gather('C', block=True)
start = timeit.time.time()
pdot(dview, A, B)
dur1 = timeit.time.time() - start
dur1
which takes approximately 34 seconds. When I view in the system monitor I can see, that in both cases all cores are used. In the parallel case there seems to be an overhead where they aren't on 100 % usage (I suppose that's the part where they get scattered across the engines). In the non parallel part immediately all cores are on 100 % usage. This surprises me as I always thought python was intrinsically run on a single core.
Would be happy if somebody has more insight into this.