I'm working on a eyes tracking program using OpenCV, dlib and TensorFlow libraries, and I encounter some issues with a keras functions that using CPU instead of GPU.
My setup
I'm working on a Jetson AGX Xavier (Jetpack 4.4), running with Ubuntu and Cuda version 10.2.89 with a Ubuntu system. The libraries was installed according these links:
- OpenCV: https://github.com/mdegans/nano_build_opencv
- Dlib: https://medium.com/@tran.minh.hoang.april/install-dlib-with-cuda-9-0-34c0f61fcf74
- Tensorflow + Keras: https://forums.developer.nvidia.com/t/official-tensorflow-for-jetson-agx-xavier/65523
The problem
Well, my code runs well, so this is not a code issue. The problem is that one of his key function running on CPU instead of GPU, and strongly impact performances. This function is the predict function available in Tensorflow Keras.
I was able to monitor the GPU usage by using thejtop
command, and it is close to 0. So i started digging why.
The things I tried
I started to make some digging by first checking the available devices for tensorflow. I ran the following command:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
And it gave me:
[name: "/device:CPU:0"
device_type: "CPU"
...
name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
...
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
...
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
...
So I assumed that, at least, Tensorflow recognizes my GPU. Then I tried I made another test:
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
And it gave me:
Name: /device:GPU:0
So at this point everything seems ok. I pushed forward by activating the logs of Tensorflow :
tf.debugging.set_log_device_placement(True)
I tracked down the two tensorflow functions I used in my program to check the detailed logs. The first used functions is called like this in my program. It is called just once:
model = tf.keras.models.load_model('2018_12_17_22_58_35.h5', compile=True)
The associated logs are:
...
2020-10-15 22:40:31.591951: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2020-10-15 22:40:31.633533: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2020-10-15 22:40:31.636725: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2020-10-15 22:40:31.666428: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2020-10-15 22:40:31.670077: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
...
So it appears that this function uses GPU. The other function, predict, is called like this:
pred_l = model.predict(eye_input)
The logs are:
...
RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-10-15 22:40:38.143067: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op ZipDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-10-15 22:40:38.161602: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op ParallelMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-10-15 22:40:38.163806: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op ModelDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-10-15 22:40:38.179115: I tensorflow/core/common_runtime/eager/execute.cc:501] Executing op RangeDataset in device /job:localhost/replica:0/task:0/device:CPU:0
...
In this case, the logs say this function use CPU, which is coherent with my initial analysis. Since this is function is call in the while loop (to apply it to everey images), it is crutial to run it on GPU to increase performance. I tried to force GPU usage by using
with tf.device('/device:GPU:0)
But it still not working.
Since I followed the official NVIDA instructions to install the lib, and since official website indicate that tensorflow will use GPU by default if it is available, i don't think it is an installation problem.
Is anyone have to solutions to this ?
Thanks.