I am learning to reinforcement and self learning while training an agent for a 2 player board game. I have the following problem:
Every action gives players reward. Normally in the step fuction player1 makes a move, environment returns a reward for that move and goes to the other player.
When other player makes a move, it also gets a reward
Problem it is a compatative gaming and player1 need to learn how to minimize the reward of player2 also. How can i include the reward player2 is getting into the player1s reward as negative number?