I'm trying to build a RL environment and an agent, and I'm having some difficulties to understand things. First of all, my problem is to fit 2 curves by some rules. I think I managed to create a custom enviroment but I couldn't figure out how to build an agent. In the first figure you can see I have 2 curves. Second curve will be stable and first curve will be fit on it at some rules. My curves as an example
This is my environment.
class testEnv(Env):
def __init__(self):
# Axtions we can take: up, down, wait
# self.action_space = Discrete(3)
# Action array
self.action_space = Box(low=-1.0,high=1.0,dtype=np.float32)
self.observation_space = Box(low=np.array([-100]),high = np.array([100]),dtype=np.float32)
#Set start amp array
self.state = x_rl1 + random.uniform(-0.5,0.5)
# Set time (60 sec)
self.time_length = 60
def step(self,action):
self.state += action - 0.1
self.time_length -= 1
TMS_Env = 0.67
y_Env =[]
for x in self.state:
y_Env.append((TMS_Env * ((0.14)/(x**0.02-1))))
y_Env = np.array(y_Env)
dt = np.min(np.subtract(self.state,y_Env))
if dt<0.4 or dt>0.29:
if dt == 0.4:
reward = 300
done = True
else:
reward = 1
else:
reward = -1
if self.time_length <= 0:
done = True
else:
done = False
# Noise
self.state += random.uniform(-1,1)
info = {'dt:{}'.format(dt),'y_Env:{}'.format(y_Env),'self.state:{}'.format(self.state)}
return self.state, reward, done, info,y_Env
def reset(self):
self.state = np.linspace(1.1,30,30) #+ random.randint(-2,2)
self.time_length = 80
pass
def render(self):
pass
When I run this manuelly I can get this:
What I want to do is, like I said it earlier, build an agent. What I know is DQN won't work for me because my action space is in BOX type. So, I decided to use DDPG. At this point, I stuck. I don't know what I'm going to do now.