Rust package compiled with Maturin for use in Python causes significant runtime overhead compared to pure Rust

58 views Asked by At

I am currently attempting to gain a speed-up for some computationally expensive parts of my Python code by instead rewriting them in Rust with PyO3 and using Maturin to create a Python package. I noticed that the Rust implementation was more than 3 times slower than the implementation I had in Python using NumPy.

I do expect some overhead regarding moving data between the languages, but because of the large discrepancy in speed I wanted to determine if this was possibly also due to the Rust code itself. I implemented the same test case in pure Rust, and then profiled both the Python and pure Rust cases using perf and found something I don't understand.

The same calls to the internal function between the Python packaged code and the native Rust code resulted in an almost 20 times slowdown of the execution!

This can be seen by looking at the profiler for the Python packaged code, and the native Rust code. I would expect at least that the internal rust functions would run the exact same between the two scenarios because surely PyO3 is nothing more than a wrapper for the same compiled Rust code.

I do not know if this is a mistake I am making with how I am building the Rust package (i.e. not optimizing with Maturin to the same extent cargo does), or if this is an expected issue, or something else entirely. Any insight into this is appreciated.

Testing details

All code mentioned here can be found in this testing branch of my Github project. The test cases correspond to running the following (run from the root directory of the project):

Python packaged

% perf record -F9999 --call-graph dwarf python rust_test.py
% perf script -F +pid > test_py.perf   

Pure Rust

% cargo build --release 
% perf record -F9999 --call-graph dwarf target/release/tpx3_toolkit
% perf script -F +pid > test_rust.perf  

And these *.perf files can be found in the benchmarks/rust directory of the Github project.

The Rust package was built in the typical method for Maturin as: maturin develop

Edit

It was brought to my attention by tadman in the comments that I was not neccesarily comparing apples to oranges here with my timing comparisons, so I did some extra changes. I added internal timing to the parse function to compare it directly between the pure Rust and Python package implementations. I also am attempting this with the largest file I have access to currently in order to minimize the overhead between language and focus more on the effect of the code in the different contexts. After doing so I have observed the following results:

Python package:

Time inside Rust for i_parse: 3.83s

Pure Rust:

Time inside Rust for i_parse: 213.97ms

I repeat this as well with another file around 1/20th the size of the above file:

Python package:

Time inside Rust for i_parse: 195.56ms

Pure Rust:

Time inside Rust for i_parse: 10.64ms

Thus, there is still somewhere around a consistent 15-20 times scaling in the runtime between pure Rust and Python packaged Rust.

0

There are 0 answers