I'm doing a simple Monte Carlo simulation exercise, using ipcluster engines of IPython. I've noticed a huge difference in execution time based on how I define my function, and I'm asking the reason for this. Here are the details:
When I definde the task as below, it is fast:
def sample(n):
return (rand(n)**2 + rand(n)**2 <= 1).sum()
When run in parallel:
from IPython.parallel import Client
rc = Client()
v = rc[:]
with v.sync_imports():
from numpy.random import rand
n = 1000000
timeit -r 1 -n 1 print 4.* sum(v.map_sync(sample, [n]*len(v))) / (n*len(v))
3.141712
1 loops, best of 1: 53.4 ms per loop
But if I change the function to:
def sample(n):
return sum(rand(n)**2 + rand(n)**2 <= 1)
I get:
3.141232
1 loops, best of 1: 3.81 s per loop
...which is 71 time slower. What can be the reason for this?
I can't go too in-depth, but the reason it is slower is because
sum(<array>)
is the built-in CPython sum function, whereas your<numpy array>.sum()
is using the numpy sum function, which is substantially faster than the built-in python version.I imagine you would get similar results if you replaced
sum(<array>)
withnumpy.sum(<array>)
see numpy sum docs here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html