How to apply DDPG OUnoise to my environment

46 views Asked by At

I am trying to perform reinforcement learning using the DDPG algorithm in my custom environment. I looked for various OUnoises here, but I couldn't find one that fits my environment.

Detail : A total of four actions are output from the Actor network. ex) tensor([0.5914, 0.5693, 0.5467, 0.6196], device='cuda:0') The range for all actions is between 0 and 1 by passing the sigmoid function in the last layer.

And the following is the class for OUnoise that I use. class OUNoise: """Ornstein-Uhlenbeck process."""

 def __init__(self, size, seed, mu=0., theta=0.15, sigma=0.1):
     """Initialize parameters and noise process."""
     self.mu = mu * torch.ones(size)
     self. theta = theta
     self.sigma = sigma
     self.seed = random.seed(seed)
     self.reset()

 def reset(self):
     """Reset the internal state (= noise) to mean (mu)."""
     self.state = copy.copy(self.mu)

 def sample(self):
     """Update internal state and return it as a noise sample."""
     x = self. state
     dx = self.theta * (self.mu - x) + self.sigma *torch.tensor(np.array([np.random.normal() for i in range(len(x))]))
     self.state = x + dx
     return self. state

As a result of learning by doing action + OUnoise, you will learn an action that does nothing. I'm learning to track the target.

What I want to ask is how to set OUnoise if the range of action is 0-1. (mean, standard deviation of OUnoise, etc. In particular, torch. () for i in range(len(x))])) < I think that line is important.)

dx = self.theta * (self.mu - x) + self.sigma *torch.tensor(np.array([np.random.normal() for i in range(len(x))])) < that line to np.random.normal(loc= 0.5, std = 0.2 ), np.random.random(), and np.random.uniform(-1,1), but there is no improvement. Also, the reason why the action range is 0 to 1 is that it is easier to convert the action value to be applied to the actual environment by using the sigmoid function rather than tanh when applied to the actual environment.

0

There are 0 answers