The book 'Introduction to Reinforcement Learning' by Barto and Sutton, mentions the following about non-stationary RL problems -
"we often encounter reinforcement learning problems that are effectively nonstationary. In such cases, it makes sense to weight recent rewards more heavily than long-past ones. " (see here -https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node20.html)
I am not absolutely convinced by this. For example, an explorer agent whose task is to find an exit for a maze might actually lose because it made a wrong choice in the distant past.
Could you please explain why it makes sense to weight more recent rewards higher in simple terms?
If the problem is non-stationary, then past experience is increasingly out of date and should be given lower weight. That way, if an explorer makes a mistake in distant past, the mistake is overwritten by more recent experience.