I have been reading the paper QMIX: Monotonic value function factorisation for deep multi-agent reinforcement Learning to understand some concepts about Q-function factorisation. There is this part that reads: [see image]
Q function constraint
I have been searching online, without success, to understand how this constraint ensures that the result of Q_tot is equal to the set of argmax of each Q_a.
Can anyone help me understand the intuition here?
I searched online to find resources that provide some insights but nothing yet.