Using non-negative derivative to enforce same results between functions

23 views Asked by At

I have been reading the paper QMIX: Monotonic value function factorisation for deep multi-agent reinforcement Learning to understand some concepts about Q-function factorisation. There is this part that reads: [see image] Q function constraint I have been searching online, without success, to understand how this constraint ensures that the result of Q_tot is equal to the set of argmax of each Q_a.

Can anyone help me understand the intuition here?

I searched online to find resources that provide some insights but nothing yet.

0

There are 0 answers