I saw suggestions (see e.g. Is multiplication and division using shift operators in C actually faster?) that you should not manually replace multiplication with the shift operator, because the compiler has to do it automatically and shift operators decrease readability. I have written a simple test to check this:
import numpy as np
import time
array1 = np.random.randint(size=10 ** 6, low=0, high=10 ** 5)
array2 = np.zeros((10 ** 6,), dtype=np.int)
total = 0.0
for i in range(100):
start = time.clock()
for j in range(len(array2)):
array2[j] = array1[j] * 2
total += time.clock() - start
print("*2 time = " + str(round(total / 10, 5)) + " ms")
total = 0.0
for i in range(100):
start = time.clock()
for j in range(len(array2)):
array2[j] = array1[j] << 1
total += time.clock() - start
print("<< 1 time = " + str(round(total / 10, 5)) + " ms")
total = 0.0
for i in range(100):
start = time.clock()
for j in range(len(array2)):
array2[j] = array1[j] // 2
total += time.clock() - start
print("//2 time = " + str(round(total / 10, 5)) + " ms")
total = 0.0
for i in range(100):
start = time.clock()
for j in range(len(array2)):
array2[j] = array1[j] >> 1
total += time.clock() - start
print(">> 1 time = " + str(round(total / 10, 5)) + " ms")
I used equivalent operations (* 2
is equivalent to << 1
and // 2
to >> 1
) and here is the result:
*2 time = 5.15086 ms
<< 1 time = 4.76214 ms
//2 time = 5.17429 ms
>> 1 time = 4.79294 ms
What is wrong? Is my testing method wrong? Is the time measurement wrong? Or does Python not perform such optimizations (and, if yes, should I be afraid of that)? I used cPython 3.4.2 x64 on Win 8.1 x64.
This optimization doesn't occur at bytecode level:
The dis module allows you to show you what happens "inside" Python when your code is executed or, more precisely, what exactly is executed. The output shows that the
*
operator is mapped toBINARY_MULTIPLY
and the<<
operator is mapped toBINARY_LSHIFT
. These two bytecode operations are implemented in C.