returning a Vec<f32> from function costs a lot?

134 views Asked by At

I'm new to Rust and am facing a weird performance issue when returning a Vec from a function to main.

Given this code, elapsed time is 13us on release compile mode.

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    let series: ChunkedArray<Float32Type>

    let start = Instant::now();
    let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    let mut result = Vec::<f32>::with_capacity(percentiles.len());
    for p in percentiles {
        match series.quantile(p as f64, QuantileInterpolOptions::Lower) {
            Ok(v) => result.push(v.unwrap()),
            _ => {}
        }
    }
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    println!("{}",result");
    Ok(())
}

Time elapsed is: 44.484µs

That's cool and all, but as soon as I extract the logic into a different function, the elapsed time becomes around 72ms (!). Note that I added another time measurement inside the percentile function, to ensure that its runtime is still super fast.

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    let series: ChunkedArray<Float32Type>

    let start = Instant::now();
    let percentiles = percentile(series);
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    println!("{}",result)
    Ok(())
}

pub fn percentile(data: Series) -> Vec<f32> {
    let data = data;

    let start = Instant::now();
    let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    let mut result = Vec::<f32>::with_capacity(percentiles.len());
    for p in percentiles {
        match data.quantile(p as f64, QuantileInterpolOptions::Lower) {
            Ok(v) => result.push(v.unwrap()),
            _ => {}
        }
    }
    let duration = start.elapsed();
    println!("internal time: {:?}", duration);

    result
}

This result in:

internal time: 13.796µs

Time elapsed is: 73.91404ms

Running this code version in debug results in about 600ms

I'm super confused. Why it seems like retuning the value from a function costs so much? What am I missing?

edit: Measured how much is cost to drop series in the first and fast version of the code, comes up to 241µs.

edit 2:

Here is the full output

mr@Ripper:~/git/me/distributor$ cargo run -r
   Compiling distributor v0.1.0 (/home/or/git/me/distributor)
    Finished release [optimized] target(s) in 3.03s
     Running `target/release/distributor`
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709]
Time elapsed is: 44.484µs
mr@Ripper:~/git/me/distributor$ cargo run -r
   Compiling distributor v0.1.0 (/home/or/git/me/distributor)
    Finished release [optimized] target(s) in 2.76s
     Running `target/release/distributor`
internal time: 13.556µs
Time elapsed is: 73.22045ms
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709
1

There are 1 answers

4
Schwern On

Since result is never used, Rust has likely optimized the body of your main function away. You may be benchmarking this.

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    // let series: ChunkedArray<Float32Type>

    let start = Instant::now();

    // let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    // let mut result = Vec::<f32>::with_capacity(percentiles.len());
    // for p in percentiles {
    //     match series.quantile(p as f64, QuantileInterpolOptions::Lower) {
    //         Ok(v) => result.push(v.unwrap()),
    //         _ => {}
    //     }
    // }

    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    Ok(())
}

By putting the code into a function, the Rust optimizer may not realize the function call is not necessary, so it's calling the function, doing the work, and throwing out the result.

You can avoid this by doing something with the result, such as passing the results to black_box.