I am using Tensorflow 2.10 in windows with a NVIDIA RTX 2060 SUPER (with tensor cores) for deep learning. But when enabling mixed precision of float16 the time per epoch actually becomes slower than faster.
Code:
import tensorflow as tf
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar100.load_data()
tf.keras.mixed_precision.set_global_policy("mixed_float16")
model = tf.keras.Sequential([
tf.keras.layers.Lambda(lambda x : x / 255, input_shape=(32,32,3)),
tf.keras.layers.Conv2D(filters=64, kernel_size=(4,4)),
tf.keras.layers.MaxPool2D(),
tf.keras.layers.Conv2D(filters=32, kernel_size=(2,2)),
tf.keras.layers.MaxPool2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(100),
tf.keras.layers.Activation("softmax", dtype="float32")
])
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
print("compute dtype of first layer: ", model.layers[0].compute_dtype)
model.fit(train_x, train_y, epochs=100, batch_size=1020)
model.evaluate(test_x, test_y)
I put some images of the problem: Here's an image of without using mixed precision , And here's an image using mixed precision, more slow
Running the code in Google Colab that uses a more modern version of tensorflow (TF 2.15) does work well, and is faster with mixed precision than without it (as it should be). Here's the link to the colab: Google Colab
I'm not an expert using tensorflow and I have been trying to fix this error for weeks, some help would be appreciated. Thanks!
Other Information:
I'm using cuDNN version 8.1.1 and Cuda 11.2, that are technically the compatible versions.
The solution I found is switch to Ubuntu (Linux) and update to the newer Tensorflow 2.15.
In this version mixed precision (float16) is twice of fast compared to the classic float32.
I also upgrade from Cuda 11.2 to 12.2 and from Cudnn 8.1.1 to 8.9