There are shuffle algorithms like FisherYates. They take an array and return one with elements in random order. This runs in O(n).
What I'm trying to do is to implement a prioritized left-shuffle algorithm. What does that mean?
- Prioritized: It does not take an array of values. It takes an array of value-probability pairs. E.g.
[ (1, 60), (2, 10), (3, 10), (4, 20) ]
. Value 1 has 60%, value 2 has 10%, ... - left-shuffle: The higher the probability of a value, the higher its chances to be far on the left of the array.
Let's take this example [ (1, 10), (2, 10), (3, 60), (4, 20) ]
. The most probable result should be [ 3, 4, 1, 2 ]
or [ 3, 4, 2, 1 ]
.
I tried implementing this, but I haven't found any solution in O(n).
O(n^2) in pseudocode based on FisherYates:
sum = 100 #100%
for i = 0 to n-2:
r = random value between 0 and sum
localsum = 0
for j = i to n-1:
localsum = localsum + pair[j].Probability
if localsum >= r + 1:
swap(i, j)
break
sum = sum - pair[i].Probability
What could probably improve this a bit: Sorting the elements decreasing by probability right in the beginning to minimize the number of swaps and the iterations in the inner loop.
Is there a better solution (maybe even in O(n))?
Update of my first answer:
I've found a paper where the 'Roulette-wheel selection via stochastic acceptance' with O(1) is introduced. This makes the algorithm to O(n) and is simple to implement
Performance Output
So it's linear time in n (data size). I updated from my first answer the constant from "updated sum" to "maximum weight of all data items" But sure it depends on the max_weight konstant. If someone has a strategy to update max_weight in a proper way, the performance would increase.