How to benchmark memory usage of a function?

11.6k views Asked by At

I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.

How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).

Edit: I found this issue which asks for the exact same thing.

5

There are 5 answers

5
ArtemGr On BEST ANSWER

You can use the jemalloc allocator to print the allocation statistics. For example,

Cargo.toml:

[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"

[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"

src/main.rs:

use libc::{c_char, c_void};
use std::ptr::{null, null_mut};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
    print!("{}", String::from_utf8_lossy(unsafe {
        std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
    }));
}

fn mem_print() {
    unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}

fn main() {
    mem_print();
    let _heap = Vec::<u8>::with_capacity (1024 * 128);
    mem_print();
}

In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.

(The "total:" of "allocated" in particular.)


You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.

(I verified this to work on Debian Jessie, in a different setting your mileage may vary).

(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).

P.S. There is now also a better DHAT.


jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.

5
llogiq On

Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().

I'll update this answer with further information once I have learned what the output means.

(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)

3
Chris Morgan On

As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.

0
Venryx On

There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs

use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};

pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);

unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
    unsafe fn alloc(&self, l: Layout) -> *mut u8 {
        self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
        self.0.alloc(l)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
        self.0.dealloc(ptr, l);
        self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
    }
}

impl<A: GlobalAlloc> Trallocator<A> {
    pub const fn new(a: A) -> Self {
        Trallocator(a, AtomicU64::new(0))
    }

    pub fn reset(&self) {
        self.1.store(0, Ordering::SeqCst);
    }
    pub fn get(&self) -> u64 {
        self.1.load(Ordering::SeqCst)
    }
}

Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)

// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]

use std::alloc::System;

#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);

fn main() {
    GLOBAL.reset();
    println!("memory used: {} bytes", GLOBAL.get());
    {
        let mut vec = vec![1, 2, 3, 4];
        for i in 5..20 {
            vec.push(i);
            println!("memory used: {} bytes", GLOBAL.get());
        }
        for v in vec {
            println!("{}", v);
        }
    }
    // For some reason this does not print zero =/
    println!("memory used: {} bytes", GLOBAL.get());
}

I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.

It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.

I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!

2
diralik On

Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:

[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"

Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:


Here is official example:

use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {
    loop {
        // many statistics are cached and only updated when the epoch is advanced.
        epoch::advance().unwrap();

        let allocated = stats::allocated::read().unwrap();
        let resident = stats::resident::read().unwrap();
        println!("{} bytes allocated/{} bytes resident", allocated, resident);
        thread::sleep(Duration::from_secs(10));
    }
}