Does Python automatically replace * 2 with << 1?

197 views Asked by At

I saw suggestions (see e.g. Is multiplication and division using shift operators in C actually faster?) that you should not manually replace multiplication with the shift operator, because the compiler has to do it automatically and shift operators decrease readability. I have written a simple test to check this:

import numpy as np
import time

array1 = np.random.randint(size=10 ** 6, low=0, high=10 ** 5)
array2 = np.zeros((10 ** 6,), dtype=np.int)

total = 0.0
for i in range(100):
    start = time.clock()
    for j in range(len(array2)):
        array2[j] = array1[j] * 2
    total += time.clock() - start
print("*2 time = " + str(round(total / 10, 5)) + " ms")


total = 0.0
for i in range(100):
    start = time.clock()
    for j in range(len(array2)):
        array2[j] = array1[j] << 1
    total += time.clock() - start
print("<< 1 time = " + str(round(total / 10, 5)) + " ms")


total = 0.0
for i in range(100):
    start = time.clock()
    for j in range(len(array2)):
        array2[j] = array1[j] // 2
    total += time.clock() - start
print("//2 time = " + str(round(total / 10, 5)) + " ms")


total = 0.0
for i in range(100):
    start = time.clock()
    for j in range(len(array2)):
        array2[j] = array1[j] >> 1
    total += time.clock() - start
print(">> 1 time = " + str(round(total / 10, 5)) + " ms")

I used equivalent operations (* 2 is equivalent to << 1 and // 2 to >> 1) and here is the result:

*2 time = 5.15086 ms
<< 1 time = 4.76214 ms
//2 time = 5.17429 ms
>> 1 time = 4.79294 ms

What is wrong? Is my testing method wrong? Is the time measurement wrong? Or does Python not perform such optimizations (and, if yes, should I be afraid of that)? I used cPython 3.4.2 x64 on Win 8.1 x64.

3

There are 3 answers

1
dlask On BEST ANSWER

This optimization doesn't occur at bytecode level:

>>> import dis
>>> dis.dis(lambda x: x*2)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_MULTIPLY
              7 RETURN_VALUE
>>> dis.dis(lambda x: x<<1)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 BINARY_LSHIFT
              7 RETURN_VALUE

The dis module allows you to show you what happens "inside" Python when your code is executed or, more precisely, what exactly is executed. The output shows that the * operator is mapped to BINARY_MULTIPLY and the << operator is mapped to BINARY_LSHIFT. These two bytecode operations are implemented in C.

0
jonrsharpe On

Using dis (to look at the bytecode equivalent of functions) and timeit (more robust timing than trying to do it manually using time) can give you a better idea of what's going on internally. Test script:

def multiply(x):
    return x * 2

def l_shift(x):
    return x << 1

def divide(x):
    return x // 2

def r_shift(x):
    return x >> 1

if __name__ == '__main__':
    import dis
    import timeit

    methods = (multiply, l_shift, divide, r_shift)
    setup = 'from __main__ import {}'.format(
        ', '.join(method.__name__ for method in methods),
    )
    for method in methods:
        print method.__name__
        dis.dis(method)
        print timeit.timeit(
            'for x in range(10): {}(x)'.format(method.__name__),
            setup=setup,
        )
        print

And outputs (CPython v2.7.6 on Windows 7):

multiply
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        
2.22467834797

l_shift
  5           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 BINARY_LSHIFT       
              7 RETURN_VALUE        
2.05381004158

divide
  8           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_FLOOR_DIVIDE 
              7 RETURN_VALUE        
2.43717730095

r_shift
 11           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 BINARY_RSHIFT       
              7 RETURN_VALUE        
2.08359396854

Clearly Python is not replacing the multiplication/division operations with the equivalent bit shifts (e.g. BINARY_FLOOR_DIVIDE is not replaced by BINARY_RSHIFT), although it looks like such an optimisation could give performance improvements. As to why the bit shift is faster, see e.g. Speeds of << >> multiplication and division on Programmers.

0
Dunes On

Only in very limited circumstances could CPython implement these optimisations. The reason being that CPython is a ducked-typed language.

Given the code fragment x * 2, this can mean very different things dependent on the value of x. If x is an integer than it does indeed have the same meaning as x << 1. However, if x is a float or a string or a list or any other class that implements __mul__ in its own unique way then it most certainly doesn't have the same meaning as x << 1. For example, "a" * 2 == "aa". So unless the value of x is known at compile time then this optimisation cannot be made. If the value of x is known beforehand then the entire operation can be optimised away eg.

In [3]: import dis

In [4]: def f():
   ...:     return 2 * 2
   ...: 

In [5]: dis.dis(f)
  2           0 LOAD_CONST               2 (4)
              3 RETURN_VALUE

You can see that the compiler has performed the operation itself and just returns the constant value 4.