TF-Agents _action_spec: how to define the correct shape for discrete action space?

554 views Asked by At

Scenario 1

My custom environment has the following _action_spec:

self._action_spec = array_spec.BoundedArraySpec(
            shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

Therefore my actions are represented as simple integer values between 0 and highestIndex. utils.validate_py_environment(env, episodes=5) works perfectly fine and steps through my environment.

I want to train a DQN. Therefore, I build a q_network:

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)

Unfortunately, I get the following error when I call these lines:

ValueError: Network only supports action_specs with shape in [(), (1,)])
  In call to configurable 'QNetwork' (<class 'tf_agents.networks.q_network.QNetwork'>)

Scenario 2

I tried to the change the shape of _action_spec to () (like in the following tutorials for a similar environment https://www.tensorflow.org/agents/tutorials/2_environments_tutorial or https://towardsdatascience.com/tf-agents-tutorial-a63399218309):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

After these changes I can create the q-network but these changes lead to the following error when utils.validate_py_environment(env, episodes=5) or driver.run() is called:

TypeError: iteration over a 0-d array

How should I specify _action_spec to solve my issue?

Edit

Scenario 3

If I change the shape to shape=(1,) (suggested by norok2):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

I can build the q-network but when I try to build the actual agent in a next step via

optimizer = tf.compat.v1.train.AdamOptimizer()

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = tf_agents.utils.common.element_wise_squared_loss,
        train_step_counter=train_step_counter)

tf_agent.initialize()

I get the following error:

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=array(0), maximum=array(13207))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)
0

There are 0 answers