The following LINQPad code generates random sequence of unique integers from 0 to N and calculates the length of cycle for every integer starting from 0. In order to calculate cycle length for a given integer, it reads value from boxes
array at the index equal to that integer, than takes the value and reads from the index equal to that value, and so on. The process stops when the value read from array is equal to original integer we started with. The number of iterations spent to calculating the length of every cycle gets saved into a Dictionary
.
const int count = 100;
var random = new Random();
var boxes = Enumerable.Range(0, count).OrderBy(x => random.Next(0, count - 1)).ToArray();
string.Join(", ", boxes.Select(x => x.ToString())).Dump("Boxes");
var stats = Enumerable.Range(0, count).ToDictionary(x => x, x => {
var iterations = 0;
var ind = x;
while(boxes[ind] != x)
{
ind = boxes[ind];
iterations++;
}
return iterations;
});
stats.GroupBy(x => x.Value).Select(x => new {x.Key, Count = x.Count()}).OrderBy(x => x.Key).Dump("Stats");
stats.Sum(x => x.Value).Dump("Total Iterations");
Typical result looks as follows:
The results I am getting seem weird to me:
- The lengths of all cycles can be grouped into only few buckets (usually 3 to 7). I was hoping to see more distinct buckets.
- The number of elements in every bucket most of the time grows together with the bucket value they belong to. I was hoping that it would be more random.
I have tried several different randomize functions, like .NET's Random
and RandomNumberGenerator
classes, as well as random data generated from random.org. All of them seem to produce similar results.
Am I doing something wrong? Are those results expected from mathematical point of view? Or, perhaps, the pseudo nature of randomizing functions that I used have side effects?
What you are doing is generating a random permutation of size
count
. Then you check the properties of the permutation. If your random number generator is good, then you should observe the statistics of random permutations.The average number of cycles of length k is 1/k, for
k<count
. On average, there is 1 fixed point, 1/2 cycles of length 2, 1/3 cycles of length 3, etc. The average number of cycles of any length is therefore 1+1/2+1/3+...+1/count ~ ln count + gamma. There are a lot of neat properties of the distribution of the number of cycles. Very occasionally there are many cycles, but the average value of 2^# cycles is count+1.Your buckets correspond to the number of different cycle lengths, which is at most the number of cycles, but might be lower because of repeated cycle lengths. On average, few cycle lengths are repeated. Even as the count increases to infinity, and the average number of cycles increases to infinity, the average number of repeated cycle lengths stays finite.
In a permutation test in statistics, usually an example of bootstrapping, to analyze some types of data, you view it as an example of a permutation. For example, you might observe two quantities, x_i and y_i. You get a permutation by sorting the xs and ys, and seeing the index of the value of y paired with the kth x value. Then you compare statistics of this permutation with the properties of random permutations. This doesn't assume much about the underlying distributions, but it can still detect when x and y seem to be related. So, it's useful to know what to expect from random permutations.