I am trying to implement DDPG algorithm of the Paper.
Here in the image below, gk[n] and rk[n] are KxM matrices of real values. Theta[n] and v[n] are arrays of size M.
I want to write correct code to specify state/observation space in my custom environment.
Since the data type input to the neural network needs to be unified, the state array can be expressed as
observation_space = spaces.Box(low=0, high=1, shape=(K, M), dtype=np.float16......)
I am stuck.
If you use stable-baselines3, you may use a
Dict
observation space filled withBox
es with meaningful limits for all your vectors and matrices (if limits are unknown, you may always use+inf/-inf
). The code could be something like: