I am trying to implement a range mapper for TRNG output files for a C application with ranges of up to 4 bits in size. Due to the pigeonhole bias problem I have settled on using a discard algorithm.
My idea for a parsimonious algorithm would be something like:
-- Read 16 bytes from file and store as an indexed 128 bit unsigned integer bitbucket to be bitmask selected n bits at a time.
-- Predetermine as much as possible the ranges/buckets required for each input and store in an array.
-- For each n bits in the bitbucket select an input from the array that will not discard it if one exists. If 2 bits cannot find an input try 3 bits and if that cannot find an input try with 4 bits. At first when there are many inputs it should be easy not to discard, but as the choice of inputs gets low discards will become more common. I am not entirely sure if it is better to start with fewer bits and work my way up or to do the opposite.
The downside of this bit sipping range mapper seems to be that I need to assume about twice as much random input data as would be required with biased scaling methods. For instance a 9 bucket input from a 4 bit rand output will miss about 43% of the time.
Existing implementations/algorithms: This seems like an example of a more complex and efficient method of parsimonious range mapping but I find his explanation entirely impenetrable. Can anyone explain it to me in English or suggest a book I might read or a university class I might take that would give me a background to understand it?
There is also arc4random which seems to be a runtime optimized unbiased modulo discard implementation. Like almost all unbiased range mapper implementations I have found this seems not to particularly care about how much data it uses. That does not however mean that it is necessarily less data efficient because it has the advantage of fewer misses.
The basic idea of arc4random seems to be that as long as the number of pigeons (max_randvalue_output) is evenly divisible by the number of holes (rangeupperbound) the modulo function itself is an elegant and unbiased range mapper. However modulo only seems to be relevant when you are not bit sipping, i.e. when the output from the random source is more than ceil(log2(buckets)) bits.
There seems to be a tradeoff between the number of 'wasted' random bits and the percentage of discards. The percentage of misses is inversely proportional to the number of excess bits in the input to the range mapper. It seems like there should be a mathematical way to compare the data efficiency of a bit sipping range mapper with a more bit hungry version with fewer misses, but I don't know it.
So my plan is to just write two implementations: a bit sipping parsimonious type of range mapper that may or may not be a little like the mathforum example (which I don't understand) and an invariant byte input modulo range mapper which accepts byte inputs from a TRNG and uses a discard-from-the-top-of-largest-multiple modulo method of debiasing to match (x)n pigeons to n holes which is intended to be like arc4random. When finished I plan to post them on codereview.
I am basically looking for help or advice with any of these issues that might help me to write a more parsimonious but still unbiased range mapper particularly with respect to my parsimonious algorithm. Runtime efficiency is not a priority.
I looked at the "Fast Dice Roller" (FDR) pointed to by @Peter.O, which is indeed simple (and avoids dividing). But each time a random number is generated, this will eat some number of bits and discard the fraction of those bits it does not use.
The "batching"/"pooling" techniques seem to do better than FDR, because the unused fractions of bits are (at least partly) retained.
But interestingly, the DrMath thing you referenced is basically the same as the FDR, but does not start from scratch for each random value it returns.
So the FDR to return 0..n-1 goes:
The DrMath thing goes:
which, as noted, is basically the same as FDR, but keeps track of the unused randomness.
When testing I found:
Where the
cost
isbits-used / (100000 * log2(3))
noting that log2(3) = (1.58496). (So thecost
is the number of bits used divided by the number of bits one would hope to use).Also:
And:
where constructed 100000 values, and for each one chose a range in
5..60
(inclusive).It seems to me that DrMath has it ! Though for larger ranges it has less of an advantage.
Mind you... DrMath uses at least 2 divisions per random value returned, which gives me conniptions run-time-wise. But you did say you weren't interested in run-time efficiency.
How does it work ?
So, we want a sequence of random values
r
to be uniformly distributed in a range0..n-1
. Inconveniently, we only have a source of randomness which gives us random values which are uniformly distributed in0..m-1
. Typicallym
will be a power of 2 -- and let us assume thatn < m
(ifn == m
the problem is trivial, ifn > m
the problem is impossible). For anyr
we can taker MOD n
to give a random value in the required range. If we only user
whenr < n
then (trivially) we have the uniform distribution we want. If we only user
whenr < (n * q)
and(n * q) < m
we also have a uniform distribution. We are here "rejecting"r
which are "too big". The fewerr
we reject, the better. So we should chooseq
such that(n * q) <= m < (n * (q-1))
-- son * q
is the largest multiple ofn
less than or equal tom
. This, in turn, tells us thatn
"much less" thanm
is to be preferred.When we "reject" a given
r
we can throw it all away, but that turns out not to be completely necessary. Also,m
does not have to be a power of 2. But we will get to that later.Here is some working Python:
Must have
N
>= maximum range, preferably much bigger.2**31
or2**63
are obvious choices.On the first call of
random_drmath()
step (2) will read random bits to "fill the pool". ForN = 2**63
, will end up withm = 2**63
andr
with 63 random bits. Clearlyr
is random and uniformly distributed in0..m-1
. [So far, so good.]Now (and on all further calls of
random_drmath()
) we hope to extract a random value uniformly in0..n-1
fromr
, as discussed above. So -- step (3) -- constructsnq
which is the largest multiple ofn
which is less than or equal tom
. Ifr >= nq
we cannot use it, because there are fewer thann
values innq..m-1
-- this is the usual "reject" criterion.So, where
r < nq
can return a value -- step (4). The trick here is to think ofm
andr
as numbers "base-n". The ls "digit" ofr
is extracted (r % n
) and returned. Thenm
andr
are shifted right by one "digit" (q = m // n
andr // n
), and stored in the "pool". I think that it is clear that at this pointr
andm
are stillr < m
andr
random and uniformly distributed in0..m-1
. Butm
is no longer a power of 2 -- but that's OK.But, if
r >= nq
must reducer
andm
together -- step (5) -- and try again. Trivially, could setm = 1 ; r = 0
and start again. But what we do is subtractnq
from bothm
andr
That leavesr
uniformly distributed in0..m-1
. This last step feels like magic, but we know thatr
is innq..m-1
and each possible value has equal probability, sor-nq
is in the range0..m-nq-1
and each possible value still has equal probability ! [Remember that the 'invariant' at the top of thewhile
loop is thatr
is random and uniformly distributed in0..m-1
.]For small
n
the rejection step will discard most ofr
, but for smalln
(compared toN
) we hope not to reject very often. Conversely, for largen
(compared toN
) we may expect to reject more often, but this retains at least some of the random bits we have eaten so far. I feel there might be a way to retain more ofr
... but a haven't thought of a simple way to do that... and if the cost of reading one random bit is high, it might be worth trying to find a not-simple way !FWIW: setting
N = 128
I get:so as
n
approachesN
the cost per value goes up.