Model get stuck by using MirroredStrategy()

462 views Asked by At

Im trying to use multiple GPU(2) in Keras, using MirroredStrategy() from Tensorflow. However it leads to the following error:

Epoch 1/5
WARNING:tensorflow:From /home/user/conda36/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
2020-09-06 18:50:54.766930: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-06 18:50:55.069925: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-06 18:50:56.049400: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at depthwise_conv_op.cc:386 : Invalid argument: Computed output size would be negative: -1 [input_size: 3, effective_filter_size: 5, stride: 1]

At this point the model gets frozen. That means it still running but it does nothing.

If I run it without MirroredStrategy() it works totally fine, but of course it only uses 1 GPU.

I use MirroredStrategy() like this:

def get_model():
    .
    .
    .
    decoded = Conv2D(1, (3, 3), activation='linear', padding='same')(d)

    autoencoder = Model(input_img, decoded)
    autoencoder.summary()
    autoencoder.compile(optimizer='Adagrad', loss='mean_squared_error')
    return autoencoder

if __name__ == '__main__':
    # Model
    strategy = tf.distribute.MirroredStrategy()
    with strategy.scope():
        autoencoder = get_model()

What could be the error?

0

There are 0 answers