How to use masking in keras-rl with DQNAgent?

238 views Asked by At

I'm working on a project where I want to train an agent to find optimal routes in a road network (Graph). I build the custom Env with OpenAI Gym, and I'm building the model and training the agent with Keras and Keras-rl respectively.

The problem is that pretty much every example I found about Deep Q Learning with Keras is with a fix set of possible actions. But in my case, the number of possible actions will change from node to node. For example: At the start node you might have 2 nodes to go as the available steps. But later you might be in a node that has 4 possible nodes to go to.

I saw that an approach to this was to mark the impossible steps with a negative reward but this doesn't sound that optimal.

I found out that you can use space.Discrete().sample(mask) to act as a filter of possible actions. The mask is an np.array([1,1,0,0,0,0,0,0,0]) where 1 means the corresponding action is possible and 0 that it isn't. This works when testing my custom Env and I don't have to redeclare the action space.

But how do I implement this to the agent training process? since it always picks one of the 10 possible actions (because that's the parameter for DQNAgent()), resulting sometimes on an IndexError: list index out of range because the possible steps is a list with the node neighbors.

Here is some of the code:

def build_model(env):
    model = Sequential()
    input_shape = (1, env.observation_space.shape[0]) # this results in (1,8)
    model.add(Flatten(input_shape=input_shape))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    n_output_nodes = env.action_space.n
    model.add(Dense(n_output_nodes, activation='linear'))
    return model


def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=50000, window_length=1)
    dqn = DQNAgent(
        model=model, 
        memory=memory, 
        policy=policy,
        nb_actions=actions, 
        nb_steps_warmup=10, 
        target_model_update=1e-2,
    )
    return dqn

The model and the agent are build as such

model = build_model(env)
dqn = build_agent(model, env.action_space.n)
dqn.compile(Adam(learning_rate=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
0

There are 0 answers