Verify Knuth shuffle algorithm is as unbiased as possible

Question

Verify Knuth shuffle algorithm is as unbiased as possible

2.9k views Asked by Adam Maras At 06 November 2009 at 04:07

I'm implementing a Knuth shuffle for a C++ project I'm working on. I'm trying to get the most unbiased results from my shuffle (and I'm not an expert on (pseudo)random number generation). I just want to make sure this is the most unbiased shuffle implementation.

draw_t is a byte type (typedef'd to unsigned char). items is the count of items in the list. I've included the code for random::get( draw_t max ) below.

for( draw_t pull_index = (items - 1); pull_index > 1; pull_index-- )
{
    draw_t push_index = random::get( pull_index );

    draw_t push_item = this->_list[push_index];
    draw_t pull_item = this->_list[pull_index];

    this->_list[push_index] = pull_item;
    this->_list[pull_index] = push_item;
}

The random function I'm using has been modified to eliminate modulo bias. RAND_MAX is assigned to random::_internal_max.

draw_t random::get( draw_t max )
{
    if( random::_is_seeded == false )
    {
        random::seed( );
    }

    int rand_value = random::_internal_max;
    int max_rand_value = random::_internal_max - ( max - ( random::_internal_max % max ) );

    do
    {
        rand_value = ::rand( );
    } while( rand_value >= max_rand_value );

    return static_cast< draw_t >( rand_value % max );
}

Original Q&A

There are 5 answers

Nick Johnson On 06 November 2009 at 09:37

The Knuth shuffle itself is provably unbiased: There exists exactly one series of operations that yields each possible shuffle. It's unlikely your PRNG has enough bits of state to express every possible shuffle, however, so the real question is if your PRNG is 'random enough' with regards to the set of shuffles it will actually produce, and whether your seeding strategy is secure enough.

Only you can decide this, as it depends on the consequences of a shuffle that isn't random enough. If you're dealing with real money, for example, I would suggest switching to a cryptographically secure PRNG and improving your seeding strategy. Although most built in PRNGs generate good randomness, they're also quite easy to reverse engineer, and calling seed() with no arguments is likely seeding based on the current time, which is pretty easy to predict.

Robert Harvey On 06 November 2009 at 04:22

Have a look at this article from Jeff Atwood:

Shuffling
http://www.codinghorror.com/blog/archives/001008.html

See also:

The Danger of Naïveté
http://www.codinghorror.com/blog/archives/001015.html

Svante On 08 December 2009 at 13:26

If I see that right, your random::get (max) doesn't include max.

This line:

draw_t push_index = random::get( pull_index );

then produces a "classical" off-by-one error, as your pull_index and push_index erroneously can never be the same. This produces a subtle bias that you can never have an item where it was before the shuffle. In an extreme example, two-item lists under this "shuffle" would always be reversed.

Adolfo On 02 January 2015 at 16:59

#include <cstdlib> // srand() && rand()

/** Shufle the first 'dim' values in array 'V[]'.
    - Implements the Fisher–Yates_shuffle.
    - Uses the standard function 'rand()' for randomness.
    - Initialices the random sequence using 'seed'.
    - Uses 'dim' swaps.
    \see http://stackoverflow.com/questions/1685339/
    \see http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
*/
template <class T>
void Fisher_Yates_shuffle( T* V, unsigned dim , unsigned seed ) {
    srand(seed);
    T temp;
    unsigned i,iPP;

    i   = dim-1;
    iPP = dim;
    while ( i>0 ) {
        unsigned j = rand() % iPP;
        if ( i!=j ) { // swap
            temp = V[i]; V[i] = V[j]; V[j] = temp;
        }
        iPP = i;
        --i;
    }
/*
    This implementation depends on the randomness of the random number
    generator used ['rand()' in this case].
*/
}

**dsimcha** · Accepted Answer · 2009-11-06T04:23:36+00:00

Well, one thing you could do as a black-box test is take some relatively small array size, perform a large number of shuffles on it, count how many times you observe each permutation, and then perform Pearson's Chi-square test to determine whether the results are uniformly distributed over the permutation space.

On the other hand, the Knuth shuffle, AKA the Fisher-Yates shuffle, is proven to be unbiased as long as the random number generator that the indices are coming from is unbiased.

TechQA.

Verify Knuth shuffle algorithm is as unbiased as possible

There are 5 answers

Related Questions in C++

Related Questions in ALGORITHM

Related Questions in SHUFFLE

Related Questions in RANDOM

Related Questions in KNUTH

Popular Questions

Popular Tags

Trending Questions