Is this a GCC optimiser bug or a feature?

58 views Asked by At

I am observing a strange behaviour where GCC (but not clang) will skip a pointer-reset-after-free expression in the global destruction phase and with certain values of -O?. I can reproduce this with every version I tried (7.2.0, 9.4.0, 12.3.0 and 13.2.0) and I'm trying to understand if this is a bug or not.

The phenomenon occurs in the C++ wrapper for lmdb; here are the functions that are involved directly:

static inline void
lmdb::env_close(MDB_env* const env) noexcept {
  delete ::mdb_env_close(env);
}

class lmdb::env {
protected:
  MDB_env* _handle{nullptr};

public:
  ~env() noexcept {
    try { close(); } catch (...) {}
  }

  MDB_env* handle() const noexcept {
    return _handle;
  }

  void close() noexcept {
    if (handle()) {
      lmdb::env_close(handle());
      _handle = nullptr;
//   std::cerr << " handle now " << handle();
    }
//     std::cerr << "\n";
  }

I use this wrapper in a class LMDBHook of which I create a single, static global variable:

class LMDBHook
{
public:
    ~LMDBHook()
    {
      if (s_envExists) {
          s_lmdbEnv.close();
          s_envExists = false;
          delete[] s_lz4CompState;
      }
    }

    static bool init()
    {
        if (!s_envExists) {
            try {
                s_lmdbEnv = lmdb::env::create();
//snip
            } catch (const lmdb::error &e) {
                // as per the documentation: the environment must be closed even if creation failed!
                s_lmdbEnv.close();
            }
        }
        return false;
    }
//snip
    static lmdb::env s_lmdbEnv;
    static bool s_envExists;
    static char* s_lz4CompState;
    static size_t s_mapSize;
};

static LMDBHook LMDB;

lmdb::env LMDBHook::s_lmdbEnv{nullptr};
bool LMDBHook::s_envExists = false;
char *LMDBHook::s_lz4CompState = nullptr;
// set the initial map size to 64Mb
size_t LMDBHook::s_mapSize = 1024UL * 1024UL * 64UL;

I first came across this issue because an application using LMDBHook would crash just before exit when built with GCC but not with clang, and initially thought that this was some subtle compiler influence on the order of calling dtors in the global destruction phase. Indeed, allocating LMDH after initialising the static member variables will not trigger the issue.

But the underlying issue here is that the line _handle = nullptr; from lmdb::env::close() will be skipped by GCC in the global destruction phase (but not when called e.g. just after lmdb::env::create() in LMDBHook::init()) and "only" when compiled with -O1, -O2 -O3 or -Ofast (so not with -O0, -Og, -Os or -Oz). The 2 trace output exprs in lmdb::env::close() are mine; if I outcomment them the reset instruction is not skipped.

The facts that it happens since so many releases and only under specific circumstances during global destruction makes me think it is maybe not a bug though if so the behaviour should not depend on the optimisation level (I think).

I have made a self-contained/standalone demonstrator that contains the lmdb++ headerfile extended with trace output exprs and where the create and close functions simply allocate the MDB_env structure with new and delete, plus the lmdb headerfile with an added definition of the MDB_env structure (normally it's opaque) : https://github.com/RJVB/gcc_potential_optimiser_bug.git . The README has instructions how to use the Makefile but it should be pretty self-explanatory.

Here's some example output:

> make -B CXX=g++-mp-12 && lmdbhook
g++-mp-12 --version
g++-mp-12 (MacPorts gcc12 12.3.0_4+cpucompat+libcxx) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

g++-mp-12 -std=c++11 -O3  -o lmdbhook    lmdbhook.cpp 
LMDBHook::LMDBHook() s_lmdbEnv=0x404248
lmdb::env::env(MDB_env*) this=0x404248 handle=0
lmdb::env::env(MDB_env*) this=0x7ffec95f40f8 handle=0x1333020
lmdb::env::~env() this=0x7ffec95f40f8
void lmdb::env::close() this=0x7ffec95f40f8 handle=0
static bool LMDBHook::init() s_lmdbEnv=0x1333020 handle=0x1333020
void lmdb::env::close() this=0x404248 handle=0x1333020
void lmdb::env_close(MDB_env*) env=0x1333020
static bool LMDBHook::init() s_lmdbEnv=0 handle=0
lmdb::env::env(MDB_env*) this=0x7ffec95f40f8 handle=0x1333020
lmdb::env::~env() this=0x7ffec95f40f8
void lmdb::env::close() this=0x7ffec95f40f8 handle=0
static bool LMDBHook::init() s_lmdbEnv=0x1333020 handle=0x1333020
mapsize=67108864 LZ4 state buffer:16384
LMDB instance is 0x404248
lmdb::env::~env() this=0x404248
void lmdb::env::close() this=0x404248 handle=0x1333020
void lmdb::env_close(MDB_env*) env=0x1333020
LMDBHook::~LMDBHook()
void lmdb::env::close() this=0x404248 handle=0x1333020
void lmdb::env_close(MDB_env*) env=0x1333020
*** Error in `lmdbhook': double free or corruption (!prev): 0x0000000001333020 ***
Abort

> make -B CXX=clang++-mp-12 && lmdbhook
clang++-mp-12 --version
clang version 12.0.1
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/local/libexec/llvm-12/bin
clang++-mp-12 -std=c++11 -O3  -o lmdbhook    lmdbhook.cpp 
LMDBHook::LMDBHook() s_lmdbEnv=0x205090
lmdb::env::env(MDB_env *const) this=0x205090 handle=0
lmdb::env::env(MDB_env *const) this=0x7ffef6c06ca0 handle=0x2e8c20
lmdb::env::~env() this=0x7ffef6c06ca0
void lmdb::env::close() this=0x7ffef6c06ca0 handle=0
static bool LMDBHook::init() s_lmdbEnv=0x2e8c20 handle=0x2e8c20
void lmdb::env::close() this=0x205090 handle=0x2e8c20
void lmdb::env_close(MDB_env *const) env=0x2e8c20
static bool LMDBHook::init() s_lmdbEnv=0 handle=0
lmdb::env::env(MDB_env *const) this=0x7ffef6c06ca0 handle=0x2e8c20
lmdb::env::~env() this=0x7ffef6c06ca0
void lmdb::env::close() this=0x7ffef6c06ca0 handle=0
static bool LMDBHook::init() s_lmdbEnv=0x2e8c20 handle=0x2e8c20
mapsize=67108864 LZ4 state buffer:16384
LMDB instance is 0x205090
lmdb::env::~env() this=0x205090
void lmdb::env::close() this=0x205090 handle=0x2e8c20
void lmdb::env_close(MDB_env *const) env=0x2e8c20
LMDBHook::~LMDBHook()
void lmdb::env::close() this=0x205090 handle=0

EDIT: The above examples are with self-built compilers on Linux but the system compilers show the same behaviour, also on Mac, and the choice of C++ runtime (libc++ or libstdc++) has no influence on either platform.

EDIT2: the demonstrator also has an alternative implementation of lmdb::env::close() as I would have written it, which caches the _handle ptr in a tmp. variable, resets _handle and only then frees the cached pointer. This implementation is not subject to whatever this issue really is.

EDIT3: I had a quick look at the assembly code via gdb (13.2). Not that I can really read it, but I don't see anything in it that could explain conditional behaviour. Built with g++ -O1 -g an with a breakpoint on lmdh::env::close() (which apparently has 6 locations!) and running through until the invocation where things go awry, I see

Breakpoint 1.1, lmdb::env::close (this=0x405250 <LMDBHook::s_lmdbEnv>) at lmdb+++.h:1185
1185        std::cerr << __PRETTY_FUNCTION__ << " this=" << this << " handle=" << handle() << "\n";
(gdb) n
void lmdb::env::close() this=0x405250 handle=0x418020
1187        if (handle()) {
(gdb) n
1188          lmdb::env_close(handle());
(gdb) 
void lmdb::env_close(MDB_env*) env=0x418020
lmdb::env::~env (this=<optimised out>, __in_chrg=<optimised out>) at lmdb+++.h:1189
1189          _handle = nullptr;
(gdb) p _handle
value has been optimised out
(gdb) info line 1189
Line 1189 of "lmdb+++.h" starts at address 0x401672 <_ZN4lmdb3envD2Ev+270> and ends at 0x401679.
(gdb) disas 0x401672,0x401679
Dump of assembler code from 0x401672 to 0x401679:
=> 0x0000000000401672 <_ZN4lmdb3envD2Ev+270>:   add    $0x8,%rsp
   0x0000000000401676 <_ZN4lmdb3envD2Ev+274>:   pop    %rbx
   0x0000000000401677 <_ZN4lmdb3envD2Ev+275>:   pop    %rbp
   0x0000000000401678 <_ZN4lmdb3envD2Ev+276>:   ret
End of assembler dump.
(gdb) c
Continuing.
LMDBHook::~LMDBHook()

Breakpoint 1.2, lmdb::env::close (this=0x405250 <LMDBHook::s_lmdbEnv>) at lmdb+++.h:1185
1185        std::cerr << __PRETTY_FUNCTION__ << " this=" << this << " handle=" << handle() << "\n";
(gdb) c
Continuing.
void lmdb::env::close() this=0x405250 handle=0x418020
void lmdb::env_close(MDB_env*) env=0x418020
*** Error in `/home/bertin/work/src/Scratch/KDE/KF5/LMDBHookTest/lmdbhook': double free or corruption (!prev): 0x0000000000418020 ***

Program received signal SIGABRT, Aborted.

The 2 asm instructions just above the disassembled line 1189 above:

   0x000000000040166a <+262>:   mov    %rbx,%rdi
   0x000000000040166d <+265>:   call   0x4010c0 <_ZdlPv@plt>

Which I can recognise as a call to delete(env). The 3 following instructions don't seem to set anything to NULL but as I said, I don't really read x86 assembly.

Here's the assemble for the same 2 lines built with clang++-12 -O1 -g:

Line 1188 of "./lmdb+++.h" starts at address 0x202f3b <_ZN4lmdb3env5closeEv+107>
   and ends at 0x202f43 <_ZN4lmdb3env5closeEv+115>.
(gdb) info line 1189
Line 1189 of "./lmdb+++.h" starts at address 0x202f4b <_ZN4lmdb3env5closeEv+123>
   and ends at 0x202f52 <_ZN4lmdb3env5closeEv+130>.
(gdb) disas 0x202f3b,0x202f52
Dump of assembler code from 0x202f3b to 0x202f52:
   0x0000000000202f3b <_ZN4lmdb3env5closeEv+107>:       mov    %r14,%rdi
   0x0000000000202f3e <_ZN4lmdb3env5closeEv+110>:       call   0x202f70 <_ZNK4lmdb3env6handleEv>
   0x0000000000202f43 <_ZN4lmdb3env5closeEv+115>:       mov    %rax,%rdi
   0x0000000000202f46 <_ZN4lmdb3env5closeEv+118>:       call   0x202860 <_ZN4lmdbL9env_closeEP7MDB_env>
=> 0x0000000000202f4b <_ZN4lmdb3env5closeEv+123>:       movq   $0x0,(%r14)
End of assembler dump.

Clang doesn't inline the env_close function with -O1, but generates a nice pointer reset even I recognise.

0

There are 0 answers