I am trying to perform reinforcement learning using the DDPG algorithm in my custom environment. I looked for various OUnoises here, but I couldn't find one that fits my environment.
Detail : A total of four actions are output from the Actor network. ex) tensor([0.5914, 0.5693, 0.5467, 0.6196], device='cuda:0') The range for all actions is between 0 and 1 by passing the sigmoid function in the last layer.
And the following is the class for OUnoise that I use. class OUNoise: """Ornstein-Uhlenbeck process."""
def __init__(self, size, seed, mu=0., theta=0.15, sigma=0.1):
"""Initialize parameters and noise process."""
self.mu = mu * torch.ones(size)
self. theta = theta
self.sigma = sigma
self.seed = random.seed(seed)
self.reset()
def reset(self):
"""Reset the internal state (= noise) to mean (mu)."""
self.state = copy.copy(self.mu)
def sample(self):
"""Update internal state and return it as a noise sample."""
x = self. state
dx = self.theta * (self.mu - x) + self.sigma *torch.tensor(np.array([np.random.normal() for i in range(len(x))]))
self.state = x + dx
return self. state
As a result of learning by doing action + OUnoise, you will learn an action that does nothing. I'm learning to track the target.
What I want to ask is how to set OUnoise if the range of action is 0-1. (mean, standard deviation of OUnoise, etc. In particular, torch. () for i in range(len(x))])) < I think that line is important.)
dx = self.theta * (self.mu - x) + self.sigma *torch.tensor(np.array([np.random.normal() for i in range(len(x))])) < that line to np.random.normal(loc= 0.5, std = 0.2 ), np.random.random(), and np.random.uniform(-1,1), but there is no improvement. Also, the reason why the action range is 0 to 1 is that it is easier to convert the action value to be applied to the actual environment by using the sigmoid function rather than tanh when applied to the actual environment.