This is based on this question. A number of answers were proposed that generate non-uniform distributions and I started wondering how to quantify the non uniformity of the output. I'm not looking for patterning issues, just single value aspects.
What are the accepted procedures?
My current thinking is to computer the average Shannon entropy per call by computing the entropy of each value and taking a weighted average. This can then be compered to the expected value.
My concerns are
- Is this correct?
- How to compute these value without loosing precision?
For #1 I'm wondering if I've got it correct.
For #2 the concern is that I would be processing numbers with magnitudes like 1/7 +/- 1e-18 and I'm worried that the floating point errors will kill me for any but the smallest problems. The exact form of the computation could result in some major differences here and I seem to recall that there are some ASM options for some special log cases but I can't seem to find the docs about this.
In this case the use is take a "good" PRNG for the range [1,n]
and generate a SRNG for the range [1,m]
. The question is how much worse is the results than the input?
What I have is expected occurrence rates for each output value.
TestU01 has an even more exacting test set than Dieharder. The largest test set is called "BigCrush", but it takes a long time to execute, so there are also subsets called just "Crush" and "SmallCrush". The idea is to first try SmallCrush, and if the PRNG passes that, try Crush, and if it passes that, BigCrush. If it passes that too, it should be good enough.
You can get TestU01 here.