tensorflow segmentation fault in Nvidia Xavier Jetson when trying to load model with memory growth enabled

Question

tensorflow segmentation fault in Nvidia Xavier Jetson when trying to load model with memory growth enabled

785 views Asked by Diogo Dinis At 15 April 2021 at 15:10

I have a segmentation fault with a very specific code sequence and only on Xavier Jetson:

import os
import requests
import tensorflow as tf
  
# 1    
print('SET MEMORY GROWTH')
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)  

# 2
print(f'REQUESTS GET')
requests.get('https://speed.hetzner.de/100MB.bin')

# 3
command = 'ls'
print(f'SYSTEM CALL ({command})')
os.system(command)

# 4 
print('MODEL LOAD') 
model = tf.keras.models.load_model('mnv2_xavier.h5')

If I remove one of these steps the code will run without issues. I don't know if some other code sequences can lead to this same behavior, but I am pretty sure that they exist.

I am trying to figure out what is the reason to have a segmentation fault here but, until now, I have no luck.

I think than can be something related with tensorflow memory growth policy and with the fact of Xavier Jetson having shared memory between CPU and GPU.

I would like to know if there is any way to solve this problem or a workaround and if someone have an explanation to this behavior.

Notes:

Code to create this model:

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Model
from tensorflow.keras import Input

x = Input((224,244,3))
y = MobileNetV2()(x)
model = Model(x,y)
model.save('mnv2_xavier.h5')

Versions:

Jetpack 4.4
tensorflow 2.3.0
keras 2.4.0
python 3.6.9

Output:

2021-04-15 16:51:22.031610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
SET MEMORY GROWTH
2021-04-15 16:51:25.349940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-15 16:51:25.374098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-04-15 16:51:25.374309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.18GiB deviceMemoryBandwidth: 82.08GiB/s
2021-04-15 16:51:25.374437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-04-15 16:51:25.377470: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-04-15 16:51:25.379874: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-15 16:51:25.380541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-15 16:51:25.383268: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-15 16:51:25.385455: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-04-15 16:51:25.385918: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-04-15 16:51:25.386201: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-04-15 16:51:25.386633: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-04-15 16:51:25.386723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
REQUESTS GET
SYSTEM CALL (ls)
code          logs          logs2
bashc.sh      main-log.log  tests
Desktop       Documents     mnv2_xavier.h5
Downloads     model.py      Music
Videos        Pictures      go  
Public        segfault.py 
MODEL LOAD
2021-04-15 16:51:29.542399: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-04-15 16:51:29.543521: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcbba840 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-04-15 16:51:29.543595: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Segmentation fault (core dumped)

Original Q&A

There are 1 answers

**Mert Bacaksız** · Answer 1 · 2022-06-16T15:52:47+00:00

this error happens because the system is trying to use more memory than it should. When the system does not allow this, it gives a Segmentation Fault error. First, check the error file as follows.

$gdb python3
(gdb) run pythonfile.py

If the error is libapt-pkg5.0 install the appropriate package for your operating system For unix-based operating systems (Xaiver,Nano,TX2);

$sudo dpkg --purge --force-depends apt apt-utils libapt-inst2.0:arm64 libapt-pkg5.0:arm64

If the error is still not resolved;

$gedit ~/.bashrc

Adding;

export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1

TechQA.

tensorflow segmentation fault in Nvidia Xavier Jetson when trying to load model with memory growth enabled

There are 1 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in KERAS

Related Questions in SEGMENTATION-FAULT

Related Questions in JETSON-XAVIER

Popular Questions

Popular Tags

Trending Questions