How to correctly interpose malloc allowing for LD_PRELOAD chaining

944 views Asked by At

I have a created shared library which interposes malloc() and related calls. The works well but for some caveats. There is one thing that does not work. I am expecting to be able to chain interposers such that I can run something like

LD_PRELOAD="/path/to/mymalloc.so /usr/lib64/jemalloc.so" some_app

The intention is that instead of forwarding to libc malloc() my library should now forward to jemalloc via RTLD_NEXT.

However it segfaults generating stack trace showing my malloc wrapper calling itself ad infinitum. Though it does not allocate any memory itself when jemalloc is not in use:

#224364 0x00007facb1aef46a in Memory::HybridAllocator<Memory::LibCAllocator, Memory::StaticAllocator>::malloc (this=0x7facb1d0be60 <Memory::getHybridAllocator()::hybrid>, size=72704) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/HybridAllocator.h:109
#224365 0x00007facb1aefa8a in malloc (size=72704) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/mallocwrap.cpp:11
#224366 0x00007facb1aeeca2 in Memory::LibCAllocator::malloc (this=0x7facb1cf3720 <Memory::getBootstrapAllocator()::bootstrap>, requestSize=72704) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/LibCAllocator.h:77
#224367 0x00007facb1aef46a in Memory::HybridAllocator<Memory::LibCAllocator, Memory::StaticAllocator>::malloc (this=0x7facb1d0be60 <Memory::getHybridAllocator()::hybrid>, size=72704) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/HybridAllocator.h:109
#224368 0x00007facb1aefa8a in malloc (size=72704) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/mallocwrap.cpp:11
#224369 0x00007facb133fc1a in (anonymous namespace)::pool::pool (this=0x7facb163e200 <(anonymous namespace)::emergency_pool>) at ../../../../libstdc++-v3/libsupc++/eh_alloc.cc:123

#224370 __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at ../../../../libstdc++-v3/libsupc++/eh_alloc.cc:262
#224371 _GLOBAL__sub_I_eh_alloc.cc(void) () at ../../../../libstdc++-v3/libsupc++/eh_alloc.cc:338
#224372 0x00007facb1d1b8ba in call_init (l=<optimized out>, argc=argc@entry=4, argv=argv@entry=0x7ffe3ba440e8, env=env@entry=0x7ffe3ba44110) at dl-init.c:72
#224373 0x00007facb1d1b9ba in call_init (env=0x7ffe3ba44110, argv=0x7ffe3ba440e8, argc=4, l=<optimized out>) at dl-init.c:30
#224374 _dl_init (main_map=0x7facb1f3a1d0, argc=4, argv=0x7ffe3ba440e8, env=0x7ffe3ba44110) at dl-init.c:119
#224375 0x00007facb1d0cfda in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#224376 0x0000000000000004 in ?? ()
#224377 0x00007ffe3ba45d7f in ?? ()
#224378 0x00007ffe3ba45ddd in ?? ()
#224379 0x00007ffe3ba45de0 in ?? ()
#224380 0x00007ffe3ba45de4 in ?? ()
#224381 0x0000000000000000 in ?? ()

Debugging in gdb the cause seems to be that malloc_hook inside __libc_malloc() is somehow set to point at my implementation of malloc resulting in an infinite recursion. But it must be jemalloc doing this somehow.

__GI___libc_malloc (bytes=16) at malloc.c:3037
3037    {
(gdb) s
3042        = atomic_forced_read (__malloc_hook);
(gdb) s
3043      if (__builtin_expect (hook != NULL, 0))
(gdb) s
3044        return (*hook)(bytes, RETURN_ADDRESS (0));
(gdb) s
malloc (size=140737488345424) at /home/brucea/work/git/libbede/src/main/cpp/memory/Memory/mallocwrap.cpp:12

The basic outline is my code (in C++ except for the low-level parts so apologies for any offence caused to C purists):

extern "C" void* malloc(const size_t size) __THROW
{
    return getMyAllocator().malloc(size);
}
// etc. for free() et al

// elsewhere
auto wrap(const char* sym)
{
    static void* libchandle = nullptr;
    auto f = dlsym(RTLD_NEXT,sym);
    if (f == nullptr)
   {
      std::fprintf(stderr, "error: unable to find symbol via dlsym(RTLD_NEXT,%s):\n",sym);
      std::fprintf(stderr, "%s\n",dlerror());
      f = dlsym(RTLD_DEFAULT, sym);
   }
   if (f == nullptr)
   {
      std::fprintf(stderr, "error: unable to find symbol via dlsym(RTLD_DEFAULT,%s):\n",sym);
      std::fprintf(stderr, "%s\n",dlerror());
      if (libchandle == nullptr)
      {
         libchandle = dlopen("libc.so", RTLD_LAZY);
         if (libchandle == nullptr)
         {                                                              \
            std::fprintf(stderr, "unable to open libc.so:\n");
            std::fprintf(stderr, "%s\n",dlerror());   
         }
         if (libchandle != nullptr)
         { 
            f = dlsym(libchandle, sym);
         } 
      }
      if (f == nullptr)
      {
         std::fprintf(stderr, "error: unable to find symbol via dlsym(\"libc\",%s):\n",sym); 
         std::fprintf(stderr, "%s\n",dlerror());
         std::exit(1);
      }
   }
   return f;
}

#define WRAP(X)                                 \
   { \
      static constexpr const char* const symName = #X;                 \
      auto f = reinterpret_cast<decltype(&::X)>(wrap(#X));             \
      this->X##Func = f; \
   } 

// Note: until ForwardingAllocator is setup
// malloc() etc are forwarded to __libc_malloc() etc
ForwardingAllocator::ForwardingAllocator()
{
   WRAP(malloc)
   WRAP(free)
   WRAP(calloc)
   WRAP(realloc)
   WRAP(malloc_usable_size)
}

Lots of stuff omitted for brevity.

Are there any suggestions as to what I might be doing wrong or how I can better diagnose the issue?

It seems that jemalloc itself defines __libc_malloc

>nm /usr/lib/debug/usr/lib64/libjemalloc.so.2-5.2.1-2.el8.x86_64.debug  | grep __libc_malloc
000000000000d4f0 t __libc_malloc

Some further information.

  • malloc_hooks are deprecated so I don't use them.

Complications I have handled with some success:

  • dlsym() uses malloc() - I use a simple bootstrap allocator during startup before switching to the main one which forwards to libc's malloc()

  • I originally used a naive allocator as a booststrap allocator

  • My wrapper to free() delegates to the appropriate free() depending on which malloc() was in use

  • I have now moved to using __libc_malloc as a the bootstrap allocator but allowing it to be replaced via dlsym as soon as possible.

This is a useful answer - https://stackoverflow.com/a/17850402/1569204

1

There are 1 answers

0
Bruce Adams On BEST ANSWER

Though jemalloc provides __libc_malloc as a symbol it is for use for static linking with glibc only.

when you forward to __libc_malloc in your shared library you are still forwarding to the libc implementation. However, it seems that during startup jemalloc sets malloc hooks to point to the previous address of malloc(). In this case the malloc wrapper in the first library (i.e. yours). After setting a couple of things up internally which currently requires 3 calls to malloc() jemalloc installs itself as the new malloc via the libc malloc hooks.

Unfortunately there is no other symbol exported by glibc that you can use to bypass malloc hooks and use malloc directly. At least on the version I'm using.

You could handle this by setting malloc hooks yourself if you have another malloc replacement to use. However, you have already expressed a desire to "do the right thing" and not use malloc hooks because they are deprecated

You can handle this without using malloc hooks by detecting recursive calls and providing a path to some other malloc for example:

   unsigned int malloc = 0;
   void* malloc(const size_t size)
   {
      if (inMalloc != 0) 
      {
         return handleRecursiveMalloc(size);
      }
      ++inMalloc;
      auto res = this->mainAllocator->malloc(size);
      --inMalloc;
      return res;
   }

   void* handleRecursiveMalloc(size_t size)
   {
      void* currentBreak = sbrk(0);
      if (currentBreak == nullptr)
      {
         return nullptr; // recursion detected and we could not handle it.
      }
      void* newBreak = sbrk(size);
      if (newBreak == nullptr)
      {
         return nullptr; // recursion detected and we could not handle it.
      }
      // we now have a block of memory between currentBreak & newBreak
      // book-keeping here if required
      //  emergencyAllocSize += size;
      //  numEmergencyAllocations++
      return currentBreak;
   }

This is ugly but it works. Your wrapper to malloc is less efficient to the tune of one increment, one decrement and one conditional branch. It probably doesn't make any difference but you could use the C++ attribute [[unlikely]] or gcc's __builtin_expect to say that the branch for recursion is not likely to be taken.


There is another pitfall to be aware of. If you are forwarding multiple symbols you should check that they are all forwarded safely (typically this means to the same library). For example:

void* f1 = dlsym(RTLD_NEXT,"malloc");
void* f2 = dlsym(RTLD_NEXT,"malloc_usable_size");
// handle failures...
Dl_info info1;
dladdr(f1,&info1);
Dl_info info2;
dladdr(f2,&info2);
// handle failures...
if (info1.dli_fbase != info2.dli_fbase)
{
    // malloc_usable_size() is provided by a different library than malloc()
    // so we probably shouldn't use it
    f2 = nullptr; 
    // set flags accordingly
}

An example of this in practice is electric-fence. If I chain:

LD_PRELOAD="mymalloc.so electric-fence.so"

You find that malloc_usable_size() comes from libc while malloc comes from electric-fence. Granted electric-fence is not so common any more.

In this case it would be safer to replace malloc_usable_size() with a dummy function that always returns 0. For example the normal libc version of malloc_usable_size(ptr) - (see https://code.woboq.org/userspace/glibc/malloc/malloc.c.html) looks at pointers located just before the allocated block (i.e. ptr-2*sizeof(size_t) ). If you give it a ptr that does not conform to this pattern it could segfault.

See for example Is it possible to define a symbol dynamically such that it will be found by dlsym?