getting wrong output on my tic tac toe negamax algorithm

87 views Asked by At

I've been learning computer gaming, I have successed on minimax algorithm with tic tac toe. but the negamax algorithm always gives the wrong move. can anyone tell me what's wrong with my code?

static negamax(board, alpha, beta, maxmizingPlayer) {
    const emptyCells = this.getEmptyCells(board);
    if (this.isGameOver(board) || emptyCells.length === 0) {
      const score = this.evaluate(board);
      return { score : maxmizingPlayer? score : -score };
    }
    let bestMove =  { score:  -Infinity };
    for (let i = 0; i < emptyCells.length; ++i) {
      const { x, y } = emptyCells[i];
      board[y][x] = maxmizingPlayer ? this.mySymbol : this.opSymbol;
      const move ={
        score: -this.negamax(board, -beta, -alpha, !maxmizingPlayer).score
      };
      board[y][x] = this.emptySymbol;
      move.x = x;
      move.y = y;
      if (move.score >= bestMove.score) {
        bestMove = move;
      }
      alpha = Math.max(alpha, bestMove.score);
      if (beta <= alpha) {
        break;
      }
    }
    return bestMove;
  }

static evaluate(board) {
    if (this.isWining(board, this.mySymbol)) {
      return Infinity;
    }
    if (this.isWining(board, this.opSymbol)) {
      return -Infinity;
    }
    return 0;
  }
1

There are 1 answers

4
8bithero On BEST ANSWER

I think the main issue might be in the way you are negating the score in the base case. A few comments below:

  1. In negamax you return { score: maxmizingPlayer ? score : -score } for your base case - The score should always be negated and not conditional on maxmizingPlayer. The reason for this is because negamax relies on the principle that a score that is beneficial to one player is equally disadvantageous to the other. So this should probably be return { score: -score };

  2. Not a deal breaker since it's consistent but you have a typo maxmizingPlayer should be maximizingPlayer (Unless this was intentional. In which case please ignore) :p

Hope that helps :)

UPDATE (based on your comment below) (In theory) changing the for loop shouldn't affect the correctness of the negamax algorithm. Given that it does might mean other factors are at play.

  1. The first thing to check would be to make sure your isWining function correctly identifies all winning conditions - You probably have but there can always be minor hidden bugs (Bonus: small typo, should be isWinning, as isWining would imply "is drinking wine" :p).

  2. A second thing to check might be the order moves are evaluated. This shouldn't affect the correctness but if changing the order the moves are evaluated changes the result there might be an issue with how states are being reverted or evaluated.

  3. When you revert the board state make sure the reversion is happening correctly and no state info is accidentally kept between iterations

  4. One final suggestion might be to check the randomness of equivalent moves. If there are multiple moves with the same evaluation score your algorithm might always choose the first one it encounters. This might help explain why changing the loop direction seemed to improve performance.

I think that's all i've got right now :S If none of that works then try double checking your alpha-beta pruning logic and make sure the maximizing and minimizing players are consistently represented. If none of that seems to help I guess you will probably need to the old fashioned way of adding loads of logging all the way through to try understand how it's evaluating and choosing moves.