Mini-max algorithm with cache optimization

45 views Asked by At

I am implementing a mini-max algorithm to solve a game, and I'm trying to optimize the algorithm with a cache so that I don't evaluate twice the same position.

The problem is that the game tree contains cycles (so it is in fact not a tree but a directed graph) because I can arrive to a position I was earlier (like a repetition in chess). The combination of a cache with a game having cycles, creates unexpected bugs.

For the sake of the example, let suppose I'm implementing a chess AI, and let suppose I have enough computational resources to evaluate and save all possible chess position once. It means I could evaluate the entire game tree if I don't evaluate the same position twice (using a cache), but it would take fare too much time without a cache.

Ideally, I would also like to change the rules and consider a draw after only 1 repetition (so it is a draw as soon as you put the game in a state that you already visited earlier), instead of 3 as in real chess. (This is a bonus optimization, but I have the same bugs without it)

In my intuitive comprehension, changing the number of allowed repetitions should not change the perfect strategy. The only effects it should have are:

  1. reduce the number of visited states
  2. avoid that the bot makes 2 repetitions before changing his move, it will make the other move at the first time

With a basic cache implementation (save the score as soon as I evaluated it, and use it next time I encounter it), the algorithm will do the following steps:

  1. Start mini-max algorithm
  2. After a few recursions, tries to evaluate a state S1, so the full branch I am evaluating is for example A -> B -> S1
  3. Continues mini-max algorithm from one of the branches from S1
  4. After a few recursions, arrives to S1 again and tries to evaluate it, so the full branch I am evaluating is for example A -> B -> S1 -> C -> D -> E -> S1
  5. Detects a cycle, evaluates S1 as a draw
  6. Because the score of S1 is incorrect, scores of previous steps will be incorrect too (states C, D and E in my example above)
  7. Because of the cache implementation, a wrong score will be associated to C, D and E in the cache
  8. Later, the algorithm will try to evaluate state C in a completely other branch, for example A -> F -> G -> C
  9. Because of C being in the cache, and having a wrong score associated with it in the cache, states A, F and G will have wrong scores too.

The problem here is that the first S1 state and the second S1 state are in fact not the same state if we take in consideration that the second state is a draw (because of the repetition) and the first one is not. But if we add the history of moves (or states) inside each state, so that the second S1 state becomes a new state S2, the cache becomes useless because we will never visit the same state twice (because the history of moves will never be the same).

If I change the rules to let repetitions happen, and consider a draw after let say 100 moves (which is far less efficient), I have exactly the same bug because the state S1 that will be evaluated at move 100 (after many repetitions) will be evaluated as a draw, and it will have the same impact on previous states.

So, how should I proceed to implement a cache mechanism in the mini-max algorithm ? I feel like it should be possible, but I couldn't fine a solution to avoid this bug.

0

There are 0 answers