uniformly distributed unbiased 4bit parsimonious range mapping from a bit limited TRNG

188 views Asked by At

I am trying to implement a range mapper for TRNG output files for a C application with ranges of up to 4 bits in size. Due to the pigeonhole bias problem I have settled on using a discard algorithm.

My idea for a parsimonious algorithm would be something like:

-- Read 16 bytes from file and store as an indexed 128 bit unsigned integer bitbucket to be bitmask selected n bits at a time.
-- Predetermine as much as possible the ranges/buckets required for each input and store in an array.
-- For each n bits in the bitbucket select an input from the array that will not discard it if one exists. If 2 bits cannot find an input try 3 bits and if that cannot find an input try with 4 bits. At first when there are many inputs it should be easy not to discard, but as the choice of inputs gets low discards will become more common. I am not entirely sure if it is better to start with fewer bits and work my way up or to do the opposite.

The downside of this bit sipping range mapper seems to be that I need to assume about twice as much random input data as would be required with biased scaling methods. For instance a 9 bucket input from a 4 bit rand output will miss about 43% of the time.

Existing implementations/algorithms: This seems like an example of a more complex and efficient method of parsimonious range mapping but I find his explanation entirely impenetrable. Can anyone explain it to me in English or suggest a book I might read or a university class I might take that would give me a background to understand it?

There is also arc4random which seems to be a runtime optimized unbiased modulo discard implementation. Like almost all unbiased range mapper implementations I have found this seems not to particularly care about how much data it uses. That does not however mean that it is necessarily less data efficient because it has the advantage of fewer misses.

The basic idea of arc4random seems to be that as long as the number of pigeons (max_randvalue_output) is evenly divisible by the number of holes (rangeupperbound) the modulo function itself is an elegant and unbiased range mapper. However modulo only seems to be relevant when you are not bit sipping, i.e. when the output from the random source is more than ceil(log2(buckets)) bits.

There seems to be a tradeoff between the number of 'wasted' random bits and the percentage of discards. The percentage of misses is inversely proportional to the number of excess bits in the input to the range mapper. It seems like there should be a mathematical way to compare the data efficiency of a bit sipping range mapper with a more bit hungry version with fewer misses, but I don't know it.

So my plan is to just write two implementations: a bit sipping parsimonious type of range mapper that may or may not be a little like the mathforum example (which I don't understand) and an invariant byte input modulo range mapper which accepts byte inputs from a TRNG and uses a discard-from-the-top-of-largest-multiple modulo method of debiasing to match (x)n pigeons to n holes which is intended to be like arc4random. When finished I plan to post them on codereview.

I am basically looking for help or advice with any of these issues that might help me to write a more parsimonious but still unbiased range mapper particularly with respect to my parsimonious algorithm. Runtime efficiency is not a priority.

2

There are 2 answers

11
Chris Hall On BEST ANSWER

I looked at the "Fast Dice Roller" (FDR) pointed to by @Peter.O, which is indeed simple (and avoids dividing). But each time a random number is generated, this will eat some number of bits and discard the fraction of those bits it does not use.

The "batching"/"pooling" techniques seem to do better than FDR, because the unused fractions of bits are (at least partly) retained.

But interestingly, the DrMath thing you referenced is basically the same as the FDR, but does not start from scratch for each random value it returns.

So the FDR to return 0..n-1 goes:

  random(n):
    m = 1 ; r = 0 
    while 1:
        # Have r random and evenly distributed in 0..m-1
        # Need m >= n -- can double m and double r adding random bit until
        #                we get that.  r remains evenly distributed in 0..m-1 
        while m < n: r = 2*r + next_bit() ; m = m*2
        # Now have r < m and n <= m < n*2
        if r < n: return r   # Hurrah !
        # Have overshot, so reduce m and r to m MOD n and r MOD m
        m -= n ; r -= n ;

The DrMath thing goes:

  # Initialisation once before first call of random(m)
  ms = 1 ; rs = 0
  N = ... # N >= maximum n and N*2 does not overflow 

  # The function -- using the "static"/"global" ms, rs and N 
  random(n):
    m = ms ; r = rs
    while 1:
        # Same as FDR -- except work up to N not n
        while m < N: r = 2*r + next_bit() ; m = m*2 ;
        # Now have r < m and m >= N
        # Set nq = largest multiple of n <= m
        # In FDR, at this point q = 1 and nq = n
        q  = m DIV n ;
        nq = n * q
        if r < nq:             # all set if r < nq
            # in FDR ms = 1, rs = 0 
            ms = q             # keep stuff not used this time
            rs = r DIV n       # ditto
            return r MOD n     # hurrah !
        # Overshot, so reduce MOD n*q -- remembering, for FDR q == 1
        m = m - nq 
        r = r - nq

which, as noted, is basically the same as FDR, but keeps track of the unused randomness.

When testing I found:

  FDR:    for 100000 values range=3 used 266804 bits cost=1.6833
  DrMath: for 100000 values range=3 used 158526 bits cost=1.0002

Where the cost is bits-used / (100000 * log2(3)) noting that log2(3) = (1.58496). (So the cost is the number of bits used divided by the number of bits one would hope to use).

Also:

  FDR:    for 100000 values range=17: 576579 bits cost=1.4106
  DrMath: for 100000 values range=17: 408774 bits cost=1.0001

And:

  FDR:    for 100000 values ranges=5..60: 578397 bits cost=1.2102
  DrMath: for 100000 values ranges=5..60: 477953 bits cost=1.0001

where constructed 100000 values, and for each one chose a range in 5..60 (inclusive).

It seems to me that DrMath has it ! Though for larger ranges it has less of an advantage.

Mind you... DrMath uses at least 2 divisions per random value returned, which gives me conniptions run-time-wise. But you did say you weren't interested in run-time efficiency.


How does it work ?

So, we want a sequence of random values r to be uniformly distributed in a range 0..n-1. Inconveniently, we only have a source of randomness which gives us random values which are uniformly distributed in 0..m-1. Typically m will be a power of 2 -- and let us assume that n < m (if n == m the problem is trivial, if n > m the problem is impossible). For any r we can take r MOD n to give a random value in the required range. If we only use r when r < n then (trivially) we have the uniform distribution we want. If we only use r when r < (n * q) and (n * q) < m we also have a uniform distribution. We are here "rejecting" r which are "too big". The fewer r we reject, the better. So we should choose q such that (n * q) <= m < (n * (q-1)) -- so n * q is the largest multiple of n less than or equal to m. This, in turn, tells us that n "much less" than m is to be preferred.

When we "reject" a given r we can throw it all away, but that turns out not to be completely necessary. Also, m does not have to be a power of 2. But we will get to that later.

Here is some working Python:

M = 1
R = 0
N = (2**63)    # N >= maximum range

REJECT_COUNT = 0

def random_drmath(n):
    global M, R, REJECT_COUNT

    # (1) load m and r "pool"
    m = M
    r = R
    while 1:
        # (2) want N <= m < N*2
        #     have 0 <= r < m, and that remains true.
        #     also r uniformly distributed in 0..m-1, and that remains true
        while m < N:
            r = 2*r + next_bit()
            m = m*2
            
        # (3) need r < nq where nq = largest multiple of n <= m
        q  = m // n
        nq = n * q
        if r < nq:
            # (4) update the m and r "pool" and return 0..n-1 
            M = q
            R = r // n
            return r % n       # hurrah !

        # (5) reject: so reduce both m and r by MOD n*q
        m = m - nq 
        r = r - nq
        REJECT_COUNT += 1

Must have N >= maximum range, preferably much bigger. 2**31 or 2**63 are obvious choices.

On the first call of random_drmath() step (2) will read random bits to "fill the pool". For N = 2**63, will end up with m = 2**63 and r with 63 random bits. Clearly r is random and uniformly distributed in 0..m-1. [So far, so good.]

Now (and on all further calls of random_drmath()) we hope to extract a random value uniformly in 0..n-1 from r, as discussed above. So -- step (3) -- constructs nq which is the largest multiple of n which is less than or equal to m. If r >= nq we cannot use it, because there are fewer than n values in nq..m-1 -- this is the usual "reject" criterion.

So, where r < nq can return a value -- step (4). The trick here is to think of m and r as numbers "base-n". The ls "digit" of r is extracted (r % n) and returned. Then m and r are shifted right by one "digit" (q = m // n and r // n), and stored in the "pool". I think that it is clear that at this point r and m are still r < m and r random and uniformly distributed in 0..m-1. But m is no longer a power of 2 -- but that's OK.

But, if r >= nq must reduce r and m together -- step (5) -- and try again. Trivially, could set m = 1 ; r = 0 and start again. But what we do is subtract nq from both m and r That leaves r uniformly distributed in 0..m-1. This last step feels like magic, but we know that r is in nq..m-1 and each possible value has equal probability, so r-nq is in the range 0..m-nq-1 and each possible value still has equal probability ! [Remember that the 'invariant' at the top of the while loop is that r is random and uniformly distributed in 0..m-1.]

For small n the rejection step will discard most of r, but for small n (compared to N) we hope not to reject very often. Conversely, for large n (compared to N) we may expect to reject more often, but this retains at least some of the random bits we have eaten so far. I feel there might be a way to retain more of r... but a haven't thought of a simple way to do that... and if the cost of reading one random bit is high, it might be worth trying to find a not-simple way !

FWIW: setting N = 128 I get:

  FDR:    for 100000 values ranges=3.. 15: 389026 bits cost=1.2881
  DrMath: for 100000 values ranges=3.. 15: 315815 bits cost=1.0457
  
  FDR:    for 100000 values ranges 3.. 31: 476428 bits cost=1.2371
  DrMath: for 100000 values ranges 3.. 31: 410195 bits cost=1.0651
  
  FDR:    for 100000 values ranges 3.. 63: 568687 bits cost=1.2003
  DrMath: for 100000 values ranges 3.. 63: 517674 bits cost=1.0927
  
  FDR:    for 100000 values ranges 3..127: 664333 bits cost=1.1727
  DrMath: for 100000 values ranges 3..127: 639269 bits cost=1.1284

so as n approaches N the cost per value goes up.

2
Peter O. On

There is a far simpler approach to generating random numbers in a range from a random bit stream, which is not only optimally efficient, but also exact. It's called the "Fast Dice Roller" method of J. Lumbroso:

"Optimal Discrete Uniform Generation from Coin Flips, and Applications", 2013.

See also this question.