I want to generate a 2D grid environment with rewards distributed (1s are rewards, 0s are no rewards) based on an alternation probability as defined in this paper by Falk and Konald.
The basic idea is that once a random square is seeded (top left, say) as 1 or 0, then the probability of the adjacent square staying the same as the previous, or switching - is given by the probability of alternation.
The paper describes the generation process as going from left to right
and top to bottom
. I am not clear on how the authors intended to implement it.
Algorithm:
- Seed top left square
- Left to Right: starting from 1,1 -- alternate with set probability
- Top to Bottom: starting from 1,1 -- alternate with set probability
- Obtain the reward matrix