Fisher Yates variation

2.2k views Asked by At

The classic Fisher Yates looks something like this:

void shuffle1(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = n - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Yesterday, I implemented the iteration "backwards" by mistake:

void shuffle2(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = 1; i < n; ++i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Is this version in any way worse (or better) than the first? Does it skew the resulting probabilities?

2

There are 2 answers

0
Pubby On BEST ANSWER

Yes it's even distribution assuming rand() is. We will prove this by showing that each input can generate each permutation with equal probability.

N=2 can be easily proven. We will draw it as a tree where the the children represent each string you can get by inserting the character after comma into the leftmost string.

  0,1   //input where 0,1 represent indices
01  10  //output. Represents permutations of 01. It is clear that each one has equal probability

For N, we will have every permutations for N-1, and randomly swapping the last character for N

    (N-1 0th permutation),N     .....          (N-1 Ith permutation),N ________________________  
      /              \                       /                   \                             \ 
0th permutation of N  1st permutation....   (I*N)th permutation   ((I*N)+1)th permutation .... (I*N)+(I-1)th permutation

This shitty induction should lead you to it having even distribution.


Example:

N=2:

  0,1
01  10 // these are the permutations. Each one has equal probability

N=3:

           0,1|2           // the | is used to separate characters that we will insert later
    01,2           10,2    // 01, 10 are permutations from N-1, 2 is the new value
 210 021 012   201 120 102 // these are the permutations, still equal probability

N=4: (curved to aid reading)

                                                           0,1|23

                                                       01,2|3  10,2|3

                                           012,3 021,3 210,3    102,3 120,3 201,3

0123 0132 0321 3230                                                                                  2013 2031 2310 3012
                    0213 0231 0312 3210                                          1203 1230 1302 3201
                                        2103 2130 2301 3102  1023 1032 1320 3021

etc

0
UncleBens On

Looks OK to me (assuming rand() % N is unbiased, which it isn't). It seems it should be possible to demonstrate that every permutation of input is produced by exactly 1 sequence of random choices, where each random choice is balanced.

Compare this with a faulty implementation, such as

for (int i = 0; i < v.size(); ++i) {
  swap(v[i], v[rand() % v.size()]);
}

Here you can see that there are nn equally likely ways to produce n! permutations, and since nn is not evenly divisible by n! where n > 2, some of those permutations have to be produced more often than others.