How to convert a U-Net segmentation model to TensorRT on NVIDIA Jetson Nano ? (process killed error)

901 views Asked by At

I trained a U-Net segmentation model with Keras (using TF backend). I am trying to convert its frozen graph (.pb) to TensorRT format on the Jetson Nano but the process is killed (as seen below). I’ve seen on other posts that it could be related to an « out of memory » problem. To be known, I already have an SSD MobileNet V2 model running on the Jetson Nano. 

If I stop the systemctl, I can make inference with the U-Net model without converting it to TensorRT (just using the frozen graph model loaded with Tensorflow). As this way doesn't work when I start the systemctl (so when the other neural network is running), I try to convert my U-Net segmentation model to TensorRT to get an optimized version of it (which failed because of a killed process), but it may not be the right way to do this.

Is it possible to run two neural networks on a Jetson Nano ? Is there any other way to do this ? 

For information, here is the way I try to convert the frozen graph to TensorRT : 

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph_gd, # Pass the parsed graph def here
    outputs=['conv2d_24/Sigmoid'],
    max_batch_size=1,
    max_workspace_size_bytes=1 << 32, # I have tried 25 and 32 here
    precision_mode='FP16'
)

And here is when the process is killed (conversion of the U-Net frozen graph to TensorRT) :

2020-10-05 16:00:58.200269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.

2020-10-05 16:01:11.976893: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7

2020-10-05 16:01:11.994472: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7

WARNING:tensorflow:

The TensorFlow contrib module will not be included in TensorFlow 2.0.

For more information, please see:

* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

* https://github.com/tensorflow/addons

* https://github.com/tensorflow/io (for I/O related ops)

If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From convert_pb_to_tensorrt.py:14: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

2020-10-05 16:01:13.678101: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7

2020-10-05 16:01:15.506432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1

2020-10-05 16:01:15.512224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.512359: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0

2020-10-05 16:01:15.512638: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session

2020-10-05 16:01:15.532712: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency

2020-10-05 16:01:15.533264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x328fd900 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2020-10-05 16:01:15.533318: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version

2020-10-05 16:01:15.632451: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.632757: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30d0edb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2020-10-05 16:01:15.632808: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3

2020-10-05 16:01:15.633163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.633276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 

name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216

pciBusID: 0000:00:00.0

2020-10-05 16:01:15.633348: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

2020-10-05 16:01:15.633500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10

2020-10-05 16:01:15.716786: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10

2020-10-05 16:01:15.903326: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10

2020-10-05 16:01:16.060655: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10

2020-10-05 16:01:16.141950: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10

2020-10-05 16:01:16.142219: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8

2020-10-05 16:01:16.142553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:16.142878: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:16.142991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0

2020-10-05 16:01:16.143133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

2020-10-05 16:01:27.700226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:

2020-10-05 16:01:27.700377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0 

2020-10-05 16:01:27.700417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N 

2020-10-05 16:01:27.713559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:27.713897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:27.714101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 200 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)

Killed
1

There are 1 answers

0
Youssef MOUNTASSIR On

If the model has unsupported layers, converting to tensor RT won't be achieved. If it's the case, using tensorflow's version or TRT can yield results as this version handles well unsupported layers (they will be handled by tensorflow alongside your tens rt converted layers).

Hope the answer is close to your problem. Tensor rt is a messy ecosystem