bernoulli_distribution vs uniform_int_distribution

401 views Asked by At

In comparing bernoulli_distribution's default constructor (50/50 chance of true/false) and uniform_int_distribution{0, 1} (uniform likely chance of 0 or 1) I find that bernoulli_distributions are at least 2x and upwards of 6x slower than uniform_int_distribution despite the fact that they give equivalent results.

I would expect bernoulii_distribition to perform better due to it being specifically designed for the probability of only two outcomes, true or false; yet, it doesn't.

Given the above and the below performance metrics, are there practical uses of bernoulli distributions over uniform_int_distributions?

Results over 5 runs (Release mode, x64-bit): (See edit below for release runs without the debugger attached)

bernoulli: 58 ms
false: 500690
true: 499310

uniform: 9 ms
1: 499710
0: 500290
----------
bernoulli: 57 ms
false: 500921
true: 499079

uniform: 9 ms
0: 499614
1: 500386
----------
bernoulli: 61 ms
false: 500440
true: 499560

uniform: 9 ms
0: 499575
1: 500425
----------
bernoulli: 59 ms
true: 498798
false: 501202

uniform: 9 ms
1: 499485
0: 500515
----------
bernoulli: 58 ms
true: 500777
false: 499223

uniform: 9 ms
0: 500450
1: 499550
----------

Profiling code:

#include <chrono>
#include <random>
#include <iostream>
#include <unordered_map>

int main() {

    auto gb = std::mt19937{std::random_device{}()};
    auto bd = std::bernoulli_distribution{};
    auto bhist = std::unordered_map<bool, int>{};

    auto start = std::chrono::steady_clock::now();
    for(int i = 0; i < 1'000'000; ++i) {
        bhist[bd(gb)]++;
    }
    auto end = std::chrono::steady_clock::now();
    auto dif = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

    std::cout << "bernoulli: " << dif.count() << " ms\n";
    std::cout << std::boolalpha;
    for(auto& b : bhist) {
        std::cout << b.first << ": " << b.second << '\n';
    }
    std::cout << std::noboolalpha;
    std::cout << '\n';

    auto gu = std::mt19937{std::random_device{}()};
    auto u = std::uniform_int_distribution<int>{0, 1};
    auto uhist = std::unordered_map<int, int>{};

    start = std::chrono::steady_clock::now();
    for(int i = 0; i < 1'000'000; ++i) {
        uhist[u(gu)]++;
    }
    end = std::chrono::steady_clock::now();
    dif = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

    std::cout << "uniform: " << dif.count() << " ms\n";
    for(auto& b : uhist) {
        std::cout << b.first << ": " << b.second << '\n';
    }
    std::cout << '\n';
}

EDIT

I re-ran the test without debugging symbols attached and bernoulli still ran a good 4x slower:

bernoulli: 37 ms
false: 500250
true: 499750

uniform: 9 ms
0: 500433
1: 499567
-----
bernoulli: 36 ms
false: 500595
true: 499405

uniform: 9 ms
0: 499061
1: 500939
-----
bernoulli: 36 ms
false: 500988
true: 499012

uniform: 8 ms
0: 499596
1: 500404
-----
bernoulli: 36 ms
true: 500425
false: 499575

uniform: 8 ms
0: 499974
1: 500026
-----
bernoulli: 36 ms
false: 500847
true: 499153

uniform: 8 ms
0: 500082
1: 499918
-----
3

There are 3 answers

0
Casey On BEST ANSWER

Some comments and answers suggest using uniform_real_distribution instead.

I tested uniform_real_distribution(0.0f, nextafter(1.0f, 20.f)) (to account for urd being a half-closed range) vs bernoulli_distribution and the bernoulli_distribution is faster by about 20%-25% regardless of the probability (and gave more correct results. I tested 1.0 true probability and my implementation that used the above urd values actually gave false negatives (granted one or two out of 5 one-million runs) and bernoulli gave the correct none.

So, speed-wise: bernoulli_distribution is faster than uniform_real_distribution but slower than uniform_int_distribution.

Long-story short, use the right tool for the job, don't reinvent the wheel, the STL is well-built, etc. and depending on the use-case one is better than the other.

For yes-no probability (IsPercentChance(float probability)), bernoulli_distribution is faster and better.

For pure "give me a random random bool value", uniform_int_distribution is faster and better.

1
ALX23z On

The class bernoulli_distribution is used to generate boolean with possible uneven ratios. To achieve that it has to generate a floating point in [0,1] range and then compare it versus the given probability. Or anything equivalent.

It is rather obvious that this routine is likely to be slower than taking a random integer modulo 2 - which is pretty much all it takes to create a uniform number in {0,1} from a random number.

How is it surprising? Only if compiler somehow manages to figure out unnecessary operations while being aware that it is 50/50 during compilation can the performance match up to even.

3
Kevin On

A default constructed std::bernoulli_distribution gives equal weight to both outcomes, but you can give it a different distribution parameter to change the probabilities. That might cause extra complexity. A better comparison would be to use a std::uniform_real_distribution<double> and compare its result to 0.5 (by default it gives a random number in the range [0, 1)).

See here for an example:

gcc output:

bernoulli: 28 ms
false: 499818
true: 500182

uniform: 31 ms
1: 500686
0: 499314

real: 29 ms
1: 500191
0: 499809

clang output:

bernoulli: 106 ms
false: 500662
true: 499338

uniform: 23 ms
1: 501263
0: 498737

real: 101 ms
1: 499683
0: 500317

The results are about the same using gcc (multiple runs tend to give uniform int a higher time, contrary to what you saw). With clang I get bernoulli and real to be about the same, with uniform int being much less time. Both of these are using -O3.