Best practice for using common subexpression elimination with lambdify in SymPy

2.8k views Asked by At

I'm currently attempting to use SymPy to generate and numerically evaluate a function and its gradient. For simplicity, I'll use the following function as an example (keeping in mind that the real function is much lengthier):

import sympy as sp
def g(x):
    return sp.cos(x) + sp.cos(x)**2 + sp.cos(x)**3

It's easy enough to numerically evaluate this function and its derivative:

import numpy as np
g_expr = sp.lambdify(x,g(x),modules='numpy')
dg_expr = sp.lambdify(x,sp.diff(g(x)),modules='numpy')

print g_expr(np.linspace(0,1,50))
print dg_expr(np.linspace(0,1,50))

For my real function, however, lambdify is slow, both in terms of generating the numerical function and in its evaluation. As many elements in my function are similar, I'd like to use common subexpression elimination (cse) within lambdify to speed this process up. I know that SymPy has a built-in function to perform cse,

>>> print sp.cse(g(x))
([(x0, cos(x))], [x0**3 + x0**2 + x0])

but don't know what syntax to use in order to utilize this result in my lambdify function (where I'd still like to use x as my input argument):

>>> g_expr_fast = sp.lambdify(x,sp.cse(g(x)),modules='numpy')
>>> print g_expr_fast(np.linspace(0,1,50))
Traceback (most recent call last):
  File "test3.py", line 34, in <module>
    print g_expr1(nx1)
  File "<string>", line 1, in <lambda>
NameError: global name 'x0' is not defined

Any help on how to use cse in lambdify would be appreciated. Or, if there are better ways to speed up my gradient calculations, I'd appreciate hearing those as well.

In case it's relevant, I'm using Python 2.7.3 and SymPy 0.7.6.

3

There are 3 answers

1
Bort On

So this might not be the most optimal way to do it, but for my small example it works.

The idea of following code is lambdifying each common subexpression and generate a new function with possibly all arguments. I added a few additional sin and cos terms to add possible dependencies from previous subexpressions.

import sympy as sp
import sympy.abc
import numpy as np
import matplotlib.pyplot as pl

def g(x):
    return sp.cos(x) + sp.cos(x)**2 + sp.cos(x)**3 + sp.sin(sp.cos(x)+sp.sin(x))**4 + sp.sin(x) - sp.cos(3*x) + sp.sin(x)**2

repl, redu=sp.cse(g(sp.abc.x))

funs = []
syms = [sp.abc.x]
for i, v in enumerate(repl):
    funs.append(sp.lambdify(syms,v[1],modules='numpy'))
    syms.append(v[0])

glam = sp.lambdify(syms,redu[0],modules='numpy')

x = np.linspace(-1,5,10)
xs=[x]

for f in funs:
    xs.append(f(*xs))

print glam(*xs)
glamlam = sp.lambdify(sp.abc.x,g(sp.abc.x),modules='numpy')
print glamlam(x)
print np.allclose(glamlam(x),glam(*xs))

repl contains:

[(x0, cos(x)), (x1, sin(x)), (x2, x0 + x1)]

and redu contains

[x0**3 + x0**2 + x1**2 + x2 + sin(x2)**4 - cos(3*x)]

So funs contains all subexpression lambdified and the list xs contains each subexpression evaluated, such one can feed glam properly in the end. xs grows with each subexpression and might turn out to be a bottle neck in the end.

You can do the same approach on the expression of sp.cse(sp.diff(g(sp.abc.x))).

0
Niko Fohr On

The speed of the calculations can be increased:

  • A bit (36x improvement in my example) by extracting the common factors and not repeating calculations.
  • A bigger speedup (up to 100x-1000x) can be obtained with for example by creating an extension module with numba.

Foreword

I assume that this is "calculate function in sympy once and use it many times in different project later" type of case. Therefore, there is some manual copy-pasting and creating of files included. However it is possible to automate the creation of the new files with the functions, and also the compilation step, but I leave that out from this now.

I had a similar problem and I did some benchmarking on different methods. The function I've used is quite long (len(str(expr)) = 45857) and cse(expr) decomposes it into 72 subexpressions. It is too long to copy-paste here but here are the steps to make up to 100x-1000x speed increase to functions created with sympy.

Benchmarks

A) Evaluating single float
Time to evaluate the function with one float value for each parameter. Using timeit myfunc(*params).

  • Baseline: lambdify with modules="numpy": 277µs
  • (1) copy-paste of str(expr) to function definition: 275µs (no difference)
  • (2) Copy-paste of expression after cse: 8.2 µs (33x improvement)
  • (3) Copy-paste of expression after cse(optimizations="basic"): 7.6µs (36x improvement)
  • (4) Use numba to compile the code to func_numba_f(): 0.25µs (1090x improvement)
  • (5) Use the sympy autowrap: 0.47 µs (589x improvement)

B) Evaluating np.array of 1000 floats

  • (1) copy-paste of str(expr) to function definition: 15100 µs | 15.1µs per value
  • (2) Copy-paste of expression after cse: 493 µs | 0.49µs per value (31x improvement)
  • (3) Copy-paste of expression after cse(optimizations="basic"): 413µs | 0.41µs per value (37x improvement)
  • (4) Use numba to compile the code to func_numba_arr(): 114µs | 0.11µs per value (132x improvement)
  • (5) sympy autowrap with np.vectorize: 480µs | 0.48µs per value (31x improvement)

(1) Copy-paste of str(expr)

  • Just copy paste the expression string into new function, and return the value.
  • Save the function in another file.

(2) Copy-paste of expression after cse

  • Idea: make the code much shorter by identifying common parts.
  • First, copy-paste the common parts:
repl, redu = cse(K)
for variable, expr in repl:
    print(f"{variable} = {expr}")
  • Then, copy-paste the return value: print(redu[0]).
  • Create another file, and paste into function definition

(3) Copy-paste of expression after cse(optimizations="basic")

  • Same as (2) but with optimizations="basic"
  • This creates slightly shorted code than (2)

(4) Use numba to compile the code

  • Use the numba.pycc.CC to compile the code. So, create a function with copy-pasting as in (3)
  • Then, create src_mymodule.py with code
from numba.pycc import CC

cc = CC("my_numba_module")


@cc.export("func_numba_f", "f8(f8, f8, f8, f8, f8)")
@cc.export("func_numba_arr", "f8[:](f8[:],f8[:],f8[:],f8[:],f8[:])")
def myfunc(x1, x2, x3, x4, x5):
    # your function definition here
    return value

if __name__ == "__main__":
    cc.compile()
  • In function func_numba_f() there are five float valued input variables and one float valued output variable. The f8 means float.
  • The func_numba_arr() is the version that handle np.arrays with dtype="float64" or dtype="float32", depending what you use to compile it.
  • You then compile the code once by running python src_mymodule.py. This will create my_numba_module.cp38-win_amd64.pyd or similar. It can be only used with the same python version and bitness as in the filename.
  • Then, in another python file, you would import the functions and use them, like:
from my_numba_module import func_numba_f, func_numba_arr

out = func_numba_f(4,3,2,1,100)

# or:
args = [np.array([x]*N, dtype='float64') for x in (4,3,2,1,100)]
out_arr = func_numba_arr(*args)

(5) Use the sympy autowrap

  • This was easy. After installing and configuring Cython, I needed two lines of code to create a function with autowrap.
from sympy.utilities.autowrap import autowrap
func = autowrap(expr, backend='cython')
  • By specifying the temp_dir argument, it saves all the source files (.c, .h, .pyx) and a .pyd (win) / .so (unix) file that can be used to import the function later on with (assuming the temp_dir is in sys.path):
from wrapper_module_1 import autofunc_c
  • This was by far the easiest method although the produced C-code is not highly optimized. The output of autowrap could be renamed in additional steps, if needed.
  • The function accepts scalars only but can be vectorized with np.vectorize
func = np.vectorize(func)
0
moorepants On

As of SymPy 1.9, lambdify can apply common subexpression elimination with the kwarg cse=True.

See: https://docs.sympy.org/latest/modules/utilities/lambdify.html?highlight=lambdify#sympy.utilities.lambdify.lambdify