TensorFlow mixed precision training: Conv2DBackpropFilter not using TensorCore

233 views Asked by At

I am using the keras mixed precision API in order to fit my networks in GPU. Typically in my code this will look like this. A MWE would be:

from tensorflow.keras.mixed_precision import experimental as mixed_precision

use_mixed_precision = True

if use_mixed_precision:
  policy_type = 'mixed_float16'
else:
  policy_type = 'float32'
policy = mixed_precision.Policy(policy_type)
mixed_precision.set_policy(policy)

This seems to have the desired effect, as, when I train my model and profile it using the TensorBoard Callback, a large chunk of my ops are run in half precision, and some of them are using TensorCore (I have a GPU with compute capability of more than 7.0).

However, Conv2DBackpropFilter is not using TensorCore, even though according to the TensorBoard information it is eligible for its use.

TensorCore ops

I don't have a minimal reproducible example for the whole thing yet, and I can work on it if needed but I wanted to know first if this was an expected behaviour or if there were some know gotchas, since I couldn't find any information online.

EDIT

I have an MRE which has a different behaviour but the same question: why is TensorCore not used (all dimensions needed are multiple of 8)?

import tensorflow as tf
from tensorflow.keras.mixed_precision import experimental as mixed_precision

use_mixed_precision = True

if use_mixed_precision:
    policy_type = 'mixed_float16'
else:
    policy_type = 'float32'
policy = mixed_precision.Policy(policy_type)
mixed_precision.set_policy(policy)

nf = 8
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=nf, kernel_size=3, padding='same'),
    tf.keras.layers.Conv2D(filters=nf, kernel_size=3, padding='same'),
    tf.keras.layers.Conv2D(filters=nf, kernel_size=3, padding='same'),
])
model.compile(loss='mse', optimizer='sgd')

bs = 8
inputs = tf.random.normal([bs, 32, 32, 1])
outputs = tf.random.normal([bs, 32, 32, nf])

tboard_cback = tf.keras.callbacks.TensorBoard(
    profile_batch='5, 10',
    log_dir='logs',
    histogram_freq=0,
    write_graph=False,
    write_images=False,
)

model.fit(inputs, outputs, callbacks=[tboard_cback], epochs=15)

In this MRE, 64.2% of the ops time is spent in half-precision, meaning that the half-precision is indeed happening. In my logs I also have the check for compute capability:

NFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla V100-SXM2-32GB, compute capability 7.0

Yet, none of the ops (this time it's not just Conv2DBackpropFilter) run with TensorCore. I don't understand why.

tensorcore_for_mre

0

There are 0 answers