Converting .npz model from ChainerRL to Keras model, or alternative methods?

277 views Asked by At

I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.

I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.

I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.

Here is the code which calls the function that builds the network in ChainerRL:

return links.Sequence(
        links.NatureDQNHead(),
        L.Linear(512, n_actions),
        DiscreteActionValue)

And the code which gets called by this, and builds a Chainer DQN network, is:

class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""

def __init__(self, n_input_channels=4, n_output_channels=512,
             activation=F.relu, bias=0.1):
    self.n_input_channels = n_input_channels
    self.activation = activation
    self.n_output_channels = n_output_channels

    layers = [
        #L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
        L.Convolution2D(n_input_channels, 32, 8, stride=4,
                        initial_bias=bias),
        #L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
        L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
        #L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
        L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
        #L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
        L.Linear(3136, n_output_channels, initial_bias=bias),
    ]

    super(NatureDQNHead, self).__init__(*layers)

def __call__(self, state):
    h = state
    for layer in self:
        h = self.activation(layer(h))
    return h

So I wrote the following Keras code to build an equivalent network in Keras:

# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)

#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))

#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)

action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out)  # num_actions in {4, .., 18}

#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)

I have examined the structure of the initial weights for the Keras model and found them to be as follows:

  • Layer 0: no weights
  • Layer 1: (8,8,4,32)
  • Layer 2: (4,4,32,64)
  • Layer 3: (4,4,64,64)
  • Layer 4: no weights
  • Layer 5: (3136,512)
  • Layer 6: (9,512)

Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:

  • Layer 0: (32,4,8,8)
  • Layer 1: (64,32,4,4)
  • Layer 2: (64,64,4,4)
  • Layer 3: (512,3136)
  • Layer 4: (9,512)

So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.

I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.

It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?

I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.

Any help would be appreciated.

1

There are 1 answers

0
muupan On

I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).

Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.