I've seen such words as:
A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.
But still didn't fully understand. What exactly is a policy in reinforcement learning?
The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent's strategy.
For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward. Here:
A policy is what an agent does to accomplish this task:
Obviously, some policies are better than others, and there are multiple ways to assess them, namely state-value function and action-value function. The goal of RL is to learn the best policy. Now the definition should make more sense (note that in the context time is better understood as a state):
A policy defines the learning agent's way of behaving at a given time.
Formally
More formally, we should first define Markov Decision Process (MDP) as a tuple (
S
,A
,P
,R
,y
), where:S
is a finite set of statesA
is a finite set of actionsP
is a state transition probability matrix (probability of ending up in a state for each current state and each action)R
is a reward function, given a state and an actiony
is a discount factor, between 0 and 1Then, a policy
π
is a probability distribution over actions given states. That is the likelihood of every action when an agent is in a particular state (of course, I'm skipping a lot of details here). This definition corresponds to the second part of your definition.I highly recommend David Silver's RL course available on YouTube. The first two lectures focus particularly on MDPs and policies.