How to quantify the quality of a pseudorandom number generator?

406 views Asked by At

This is based on this question. A number of answers were proposed that generate non-uniform distributions and I started wondering how to quantify the non uniformity of the output. I'm not looking for patterning issues, just single value aspects.

What are the accepted procedures?


My current thinking is to computer the average Shannon entropy per call by computing the entropy of each value and taking a weighted average. This can then be compered to the expected value.

My concerns are

  1. Is this correct?
  2. How to compute these value without loosing precision?

For #1 I'm wondering if I've got it correct.

For #2 the concern is that I would be processing numbers with magnitudes like 1/7 +/- 1e-18 and I'm worried that the floating point errors will kill me for any but the smallest problems. The exact form of the computation could result in some major differences here and I seem to recall that there are some ASM options for some special log cases but I can't seem to find the docs about this.


In this case the use is take a "good" PRNG for the range [1,n] and generate a SRNG for the range [1,m]. The question is how much worse is the results than the input?

What I have is expected occurrence rates for each output value.

3

There are 3 answers

0
bkov On

TestU01 has an even more exacting test set than Dieharder. The largest test set is called "BigCrush", but it takes a long time to execute, so there are also subsets called just "Crush" and "SmallCrush". The idea is to first try SmallCrush, and if the PRNG passes that, try Crush, and if it passes that, BigCrush. If it passes that too, it should be good enough.

You can get TestU01 here.

0
Adam Davis On

NIST has a set of documents and tools for statistically analyzing random number generators cross a variety of metrics.

http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

Many of these tests are also incorporated into the Dieharder PRNG test suite.

http://www.phy.duke.edu/~rgb/General/rand_rate.php

There are a ton of different metrics, because there are many, many different ways to use PRNGs. You can't analyze a PRNG in a vacuum - you have to understand the use case. These tools and documents provide a lot of information to help you in this, but at the end of the day you'll still have to understand what you actually need before you can determine of the algorithm is suitable. The NIST documentation is thorough, if somewhat dense.

-Adam

1
twk On

This page discusses one way of checking if you are getting a bad distribution: plotting the pseudo-random values in a field and then just looking at them.