I'm new to Rust and am facing a weird performance issue when returning a Vec from a function to main.
Given this code, elapsed time is 13us on release compile mode.
fn main() -> Result<(), ()> {
// ... Some code for loading data, ends up with:
let series: ChunkedArray<Float32Type>
let start = Instant::now();
let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
let mut result = Vec::<f32>::with_capacity(percentiles.len());
for p in percentiles {
match series.quantile(p as f64, QuantileInterpolOptions::Lower) {
Ok(v) => result.push(v.unwrap()),
_ => {}
}
}
let duration = start.elapsed();
println!("Time elapsed is: {:?}", duration);
println!("{}",result");
Ok(())
}
Time elapsed is: 44.484µs
That's cool and all, but as soon as I extract the logic into a different function, the elapsed time becomes around 72ms (!). Note that I added another time measurement inside the percentile
function, to ensure that its runtime is still super fast.
fn main() -> Result<(), ()> {
// ... Some code for loading data, ends up with:
let series: ChunkedArray<Float32Type>
let start = Instant::now();
let percentiles = percentile(series);
let duration = start.elapsed();
println!("Time elapsed is: {:?}", duration);
println!("{}",result)
Ok(())
}
pub fn percentile(data: Series) -> Vec<f32> {
let data = data;
let start = Instant::now();
let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
let mut result = Vec::<f32>::with_capacity(percentiles.len());
for p in percentiles {
match data.quantile(p as f64, QuantileInterpolOptions::Lower) {
Ok(v) => result.push(v.unwrap()),
_ => {}
}
}
let duration = start.elapsed();
println!("internal time: {:?}", duration);
result
}
This result in:
internal time: 13.796µs
Time elapsed is: 73.91404ms
Running this code version in debug results in about 600ms
I'm super confused. Why it seems like retuning the value from a function costs so much? What am I missing?
edit:
Measured how much is cost to drop series
in the first and fast version of the code, comes up to 241µs.
edit 2:
Here is the full output
mr@Ripper:~/git/me/distributor$ cargo run -r
Compiling distributor v0.1.0 (/home/or/git/me/distributor)
Finished release [optimized] target(s) in 3.03s
Running `target/release/distributor`
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709]
Time elapsed is: 44.484µs
mr@Ripper:~/git/me/distributor$ cargo run -r
Compiling distributor v0.1.0 (/home/or/git/me/distributor)
Finished release [optimized] target(s) in 2.76s
Running `target/release/distributor`
internal time: 13.556µs
Time elapsed is: 73.22045ms
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709
Since
result
is never used, Rust has likely optimized the body of yourmain
function away. You may be benchmarking this.By putting the code into a function, the Rust optimizer may not realize the function call is not necessary, so it's calling the function, doing the work, and throwing out the result.
You can avoid this by doing something with the result, such as passing the results to
black_box
.