Tracing a segmentation fault in a 3rd party library: cv::ImageCodecInitializer destructor crashes

943 views Asked by At

We're developing a framework, which directly uses mrpt-1.9 which in turn uses OpenCV 2.4. We were writing unit tests, which segfault when the tests exists (e.g., during cleanup) with an OpenCV error: cv::String::deallocate()

What I have tried:

running with valgrind

==26159== Conditional jump or move depends on uninitialised value(s)
==26159==    at 0x7DB7F5: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB0: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159== 
==26159== Invalid read of size 4
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  Address 0x1a is not stack'd, malloc'd or (recently) free'd
==26159== 
==26159== 
==26159== Process terminating with default action of signal 11 (SIGSEGV)
==26159==  Access not within mapped region at address 0x1A
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  If you believe this happened as a result of a stack
==26159==  overflow in your program's main thread (unlikely but
==26159==  possible), you can try to increase the size of the
==26159==  main thread stack using the --main-stacksize= flag.
==26159==  The main thread stack size used in this run was 8388608.
==26159== 
==26159== HEAP SUMMARY:
==26159==     in use at exit: 286,067 bytes in 1,147 blocks
==26159==   total heap usage: 7,469 allocs, 6,322 frees, 1,912,969 bytes allocated
==26159== 
==26159== LEAK SUMMARY:
==26159==    definitely lost: 0 bytes in 0 blocks
==26159==    indirectly lost: 0 bytes in 0 blocks
==26159==      possibly lost: 2,299 bytes in 27 blocks
==26159==    still reachable: 283,768 bytes in 1,120 blocks
==26159==                       of which reachable via heuristic:
==26159==                         newarray           : 1,536 bytes in 16 blocks
==26159==         suppressed: 0 bytes in 0 blocks
==26159== Rerun with --leak-check=full to see details of leaked memory
==26159== 
==26159== For counts of detected and suppressed errors, rerun with: -v
==26159== Use --track-origins=yes to see where uninitialised values come from
==26159== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

AFAIK this could be either us calling an MRPT function incorrectly, or a bug in MRPT itself.

running it with gdb:

I've been trying to debug it in gdb, but I can only go as far as getting the backtrace, but not which part of our code is the one responsible for it. Since it seems to happen after main exits, it is really confusing. Even worse, the class we construct (but do not actually do anything with) does not contain any MRPT classes or objects, so I am guessing this is in MRPT libraries and not our framework.

Thread 1 "debug" received signal SIGSEGV, Segmentation fault.
0x00000000005b569b in cv::String::deallocate() ()
(gdb) bt
#0  0x00000000005b569b in cv::String::deallocate() ()
#1  0x000000000089969a in cv::BmpEncoder::~BmpEncoder() ()
#2  0x00000000008996d9 in cv::BmpEncoder::~BmpEncoder() [clone .localalias.25] ()
#3  0x00007ffff36a4f66 in cv::ImageCodecInitializer::~ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#4  0x00007ffff484136a in __cxa_finalize (d=0x7ffff38d1000) at cxa_finalize.c:56
#5  0x00007ffff369fb53 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#6  0x00007fffffffd8b0 in ?? ()
#7  0x00007ffff7de7de7 in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC

I've set a breakpoint at break cv::ImageCodecInitializer::~ImageCodecInitializer

and I got as far as:

Thread 1 "debug" hit Breakpoint 3, 0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
(gdb) bt
#0  0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
#1  0x00007ffff4840ff8 in __run_exit_handlers (status=0, listp=0x7ffff4bcb5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#2  0x00007ffff4841045 in __GI_exit (status=<optimised out>) at exit.c:104
#3  0x00007ffff4827837 in __libc_start_main (main=0x5a4536 <main()>, argc=1, argv=0x7fffffffd9d8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffd9c8) at ../csu/libc-start.c:325
#4  0x00000000005a4469 in _start ()

searched for opencv-2.4 debug

The app is build with debug symbols, but the system does not appear to have opencv-2.4 with debug symbols, so I keep getting the optimized out warning.

libopencv-apps-dev - opencv_apps Robot OS package - development files
libopencv-apps0d - opencv_apps Robot OS package - runtime files
libopencv-calib3d2.4v5 - computer vision Camera Calibration library
libopencv-contrib-dev - development files for libopencv-contrib
libopencv-contrib2.4v5 - computer vision contrib library
libopencv-core2.4v5 - computer vision core library
libopencv-dev - development files for opencv
libopencv-features2d2.4v5 - computer vision Feature Detection and Descriptor Extraction library
libopencv-flann2.4v5 - computer vision Clustering and Search in Multi-Dimensional spaces library
libopencv-gpu-dev - development files for libopencv-gpu2.4v5
libopencv-gpu2.4v5 - computer vision GPU library
libopencv-highgui2.4v5 - computer vision High-level GUI and Media I/O library
libopencv-imgproc2.4v5 - computer vision Image Processing library
libopencv-legacy-dev - development files for libopencv-legacy
libopencv-legacy2.4v5 - computer vision legacy library
libopencv-ml2.4v5 - computer vision Machine Learning library
libopencv-objdetect2.4v5 - computer vision Object Detection library
libopencv-ocl-dev - development files for libopencv-ocl2.4v5
libopencv-ocl2.4v5 - computer vision OpenCL support library
libopencv-photo2.4v5 - computer vision computational photography library
libopencv-stitching2.4v5 - computer vision image stitching library
libopencv-superres2.4v5 - computer vision Super Resolution library
libopencv-ts2.4v5 - computer vision ts library
libopencv-video2.4v5 - computer vision Video analysis library
libopencv-videostab2.4v5 - computer vision video stabilization library
libopencv2.4-java - Java bindings for the computer vision library
libopencv2.4-jni - Java jni library for the computer vision library

searched for actual point of offending function

I've gone through the minified debug executable we've built in order to try and pin-point the issue, and then tried searching for the actual function:

nm -Ca debug | grep "ImageCodecInitializer"
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()

Then I tried to find what GDB has to say about those addresses:

(gdb) info line *0x0000000000889290
No line number information available for address 0x889290 <_ZN2cv21ImageCodecInitializerC2Ev>

But I can't go anywhere from there, so I searched in GDB to find who constructs this:

#0  0x00007ffff36a6240 in cv::ImageCodecInitializer::ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#1  0x00007ffff369f8f6 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#2  0x00007ffff7de76ba in call_init (l=<optimised out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd9d8, env=env@entry=0x7fffffffd9e8) at dl-init.c:72
#3  0x00007ffff7de77cb in call_init (env=0x7fffffffd9e8, argv=0x7fffffffd9d8, argc=1, l=<optimised out>) at dl-init.c:30
#4  _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffd9d8, env=0x7fffffffd9e8) at dl-init.c:120
#5  0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007fffffffdda0 in ?? ()
#8  0x0000000000000000 in ?? ()

Again optimized out.

searched for library which uses the offending function

The function is in libopencv_highgui.so.2.4 so I am guessing that one of MRPT libs is using it, so I went searching for which MRPT libs we're linking against which is using it, and found it:

readelf -d debug 

Dynamic section at offset 0x2b49bb0 contains 41 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libboost_system.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_filesystem.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libmrpt-base.so.1.9]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libpng12.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libtiff.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libjasper.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libIlmImf-2_2.so.22]
 0x0000000000000001 (NEEDED)             Shared library: [libHalf.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

So, I found that:

sudo ldconfig -p | grep "libmrpt-base.so.1.9"
        libmrpt-base.so.1.9 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

And then:

readelf -d /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

Dynamic section at offset 0xa5aea8 contains 37 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libcxsparse.so.3.1.4]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_baseu-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_gtk2u_core-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_highgui.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libmrpt-base.so.1.9]

I know this is the library creating the issue, because in our project we use opencv-3.3 statically linked against it. Sadly, the repository we're using does not have debug symbols for MRPT either:

libmrpt-base1.9 - Mobile Robot Programming Toolkit - base library
libmrpt-detectors1.9 - Mobile Robot Programming Toolkit - detectors library
libmrpt-graphs1.9 - Mobile Robot Programming Toolkit - graphs library
libmrpt-graphslam1.9 - Mobile Robot Programming Toolkit - graphslam library
libmrpt-gui1.9 - Mobile Robot Programming Toolkit - gui library
libmrpt-hmtslam1.9 - Mobile Robot Programming Toolkit - hmtslam library
libmrpt-hwdrivers1.9 - Mobile Robot Programming Toolkit - hwdrivers library
libmrpt-kinematics1.9 - Mobile Robot Programming Toolkit - kinematics library
libmrpt-maps1.9 - Mobile Robot Programming Toolkit - maps library
libmrpt-nav1.9 - Mobile Robot Programming Toolkit - nav library
libmrpt-obs1.9 - Mobile Robot Programming Toolkit - obs library
libmrpt-opengl1.9 - Mobile Robot Programming Toolkit - opengl library
libmrpt-slam1.9 - Mobile Robot Programming Toolkit - slam library
libmrpt-tfest1.9 - Mobile Robot Programming Toolkit - tfest library
libmrpt-topography1.9 - Mobile Robot Programming Toolkit - topography library
libmrpt-vision1.9 - Mobile Robot Programming Toolkit - vision library
libmrpt-comms1.9 - Mobile Robot Programming Toolkit - comms library

And even worse:

nm -C libmrpt-base.so
nm: libmrpt-base.so: no symbols

And this is where the journey ends.

What are my options?

  • use another version of mrpt?
  • compile mrpt with debug symbols?
  • compile opencv-2.4 with debug symbols?

Any help, hints or tips are greatly appreciated. If this question is too localized, does not conform to SO standards, please leave a comment and I will update it.

2

There are 2 answers

1
Jose Luis Blanco On BEST ANSWER

My first guess is that you might be getting this issue due to the use of two opencv versions at once... Try building mrpt from sources telling CMake to use the same opencv version you use for the main project.

mrpt-base does not directly use anything from highgui (although...it's linked against it! That should be fixed, four sure), so I suspect the error has to do with the initialization of static variables in opencv modules and something wrong with the linker...

Cheers

3
Paul Floyd On

Not really an answer, but comments aren't good for formatting code. The latest opencv on github has the following source

void cv::String::deallocate()
{
    int* data = (int*)cstr_;
    len_ = 0;
    cstr_ = 0;

    if(data && 1 == CV_XADD(data-1, -1))
    {
        cv::fastFree(data-1);
    }
}

(which is probably more recent than your version).

It looks like this is storing strings as a reference count in the first 4 bytes followed by the nul terminated string. The if condition checks that the pointer isn't NULL then it looks like it's going an atomic decrement of the ref count and freeing the memory if the count falls to 1.