Input states for Deep Q Learning

811 views Asked by At

I am using the DQN for a resource allocation where the agent should assign the arrival requests to the best Virtual Machine. I am modifying a Cartpole code as follow:

import random
import gym
import numpy as np
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import os 

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95 
        self.epsilon = 1.0 
        self.epsilon_decay = 0.995 
        self.epsilon_min = 0.01 
        self.learning_rate = 0.001 
        self.model = self._build_model()
    
    def _build_model(self):
        model = Sequential()
        model.add(Dense(24, input_dim=self.state_size, activation='relu')) 
        model.add(Dense(24, activation='relu')) 
        model.add(Dense(self.action_size, activation='linear')) 
        model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate))
        return model
    
    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done)) 

    def act(self, state):
        if np.random.rand() <= self.epsilon: 
            return random.randrange(self.action_size)
        act_values = self.model.predict(state) 
        return np.argmax(act_values[0])

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size) 
        for state, action, reward, next_state, done in minibatch: 
            target = reward 
            if not done: 
                target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) 
            target_f = self.model.predict(state) 
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)

        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def load(self, name):
        self.model.load_weights(name)

    def save(self, name):
        self.model.save_weights(name)

The Cartpole states as the inputs of the Q network are given by the environment.

0   Cart Position
1   Cart Velocity       -Inf    Inf
2   Pole Angle          ~ -41.8°    ~ 41.8°
3   Pole Velocity At Tip

The question is that in my code what are the inputs of the Q network? Since the agent should take the best possible action based on the size of the arrival request but this is not given by the environment. Shall I feed the Q network by this input value, the size?

1

There are 1 answers

0
HenDoNR On

The inputs of the Deep Q-Network architecture is fed by the replay memory, in the following part of the code:

def remember(self, state, action, reward, next_state, done):
    self.memory.append((state, action, reward, next_state, done))

The dynamic of this system as shown in the original paper Deepmind paper, is that you interact with the system, store the transition in the replay memory, and then use it for the training step. In the lines above you're storing these experiences.

Basically, the input of the network is the states and outputs the Q-values. In your code, there's no interaction with the environment, that's when you can get these transitions (experiences) to feed the replay memory. So, if you can't extract some information in the environment to be represented as states, you're not able to make assumptions about that.