Which Q-value do I select as the action from the output of my Deep Q-Network?

31 views Asked by At

I'm working on a project that involves using a Deep Q-Learning Agent to learn the appropriate way to select certain nodes of a 1000-node NetworkX Graph. My observation_space is a (1000, 3) array, with each row representing the node label, it's degree, and a variable/attribute (either 0, 1, or 2). The action_space has a (1000, 1) shape, with each element corresponding to taking an action on a specified node.

This is the code for my Deep Q Network:

def nnmodel(observation_space, action_space):

    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(128, input_shape = (None, observation_space.shape[0], observation_space.shape[1]), activation='relu'))
    model.add(tf.keras.layers.Dense(256, activation='relu'))
    model.add(tf.keras.layers.Dense(256, activation='relu'))
    model.add(tf.keras.layers.Dense(len(action_space), activation='linear'))

    model.compile(optimizer=Adam(), loss='mse', metrics = ['accuracy'])

    return model

which is, in theory and as I understand, supposed to give me the q-values from

q_values = model.predict(observation_space)

However, my q_values has the shape (1000, 1000) and I am unsure which "highest q-value" I should be considering that corresponds to the node that the agent should perform an action on. Is it the highest q-value entry, for which the row / column corresponds to the node the agent should be selecting? Or is it the largest row/column sum? Or is it something else entirely? Examples I've looked at online typically use np.argmax(q_values[0]), which I feel does not apply to my case.

Also, does my input_shape look correct for the problem I'm describing?

Any help is appreciated!

max_q = np.max(q_values)
position = np.where(q_values == max_q)
print(position)

This returns the index of the largest q-value. I'm unsure if this means I should be selecting the i-th/j-th node for the ith row or j-th column.

0

There are 0 answers