What prevents g++ from eliminating temporary std::array not used in runtime?

640 views Asked by At
#include <array>
#include <cassert>

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::array<P, 4>().size() != 4)
    assert(false);
}

The function foo() creates a temporary array to check the size is what the programmer expected. With -O1 or higher g++ figures out the assert will not fail and the call to __assert_fail is removed from the generated code. But g++ still generates code to first construct and then destruct the now unused array.

g++ -std=c++11 -O3 [4.8.2]:

0000000000000000 <_Z3foov>:1
   0:       55                      push   %rbp1
   1:       66 0f ef c0             pxor   %xmm0,%xmm01
   5:       53                      push   %rbx1
   6:       48 83 ec 28             sub    $0x28,%rsp1
   a:       66 0f 7f 04 24          movdqa %xmm0,(%rsp)1
   f:       48 8d 5c 24 20          lea    0x20(%rsp),%rbx1
  14:       48 89 e5                mov    %rsp,%rbp1
  17:       66 0f 7f 44 24 10       movdqa %xmm0,0x10(%rsp)1
  1d:       0f 1f 00                nopl   (%rax)1
  20:       48 83 eb 08             sub    $0x8,%rbx1
  24:       48 8b 3b                mov    (%rbx),%rdi1
  27:       e8 00 00 00 00          callq  2c <_Z3foov+0x2c>1
  2c:       48 39 eb                cmp    %rbp,%rbx1
  2f:       75 ef                   jne    20 <_Z3foov+0x20>1
  31:       48 83 c4 28             add    $0x28,%rsp1
  35:       5b                      pop    %rbx1
  36:       5d                      pop    %rbp1
  37:       c3                      retq   1

clang on the other hand removes all code except the return statement.

clang -std=c++11 -O3:

0000000000000000 <_Z3foov>:1
   0:       c3                      retq   1

Just bad luck with g++ or is there a reason for the difference?

3

There are 3 answers

1
delehef On

Because there could be side effects in the constructor of std::array. However, as g++ and clang don't use the same standard library (libstdc++ for g++ and libc++ for clang).

For the question why does clang remove the code, maybe that in libc++ the std::array's constructor in inlined, so the optimizer can see that there is no side effects, therefore need to keep the code.

3
David Greene On

This could happen for any number of reasons. Perhaps gcc doesn't inline as well as clang for some reason. Turning up the inlining knobs on gcc might help. Or perhaps there is something else going on like an aliasing issue gcc can't resolve for some reason. It's impossible to know without tracing through gcc to discover the details.

Bottom line, it's just two different compilers doing different code transformations. gcc could be enhanced to cover this case.

13
user3125280 On

Firstly, great question.

EDIT:

I didn't really read your code properly the first time. There is an important external call in your code. This is in this instruction e8 00 00 00 00 callq 2c <_Z3foov+0x2c>1. It does not call the address ahead of itself - instead it's target will get replaced at link time. This is how linking works - the elf file will say "such and such an instruction will resolve to this target at link time." The assembler has not been listed completely, and so we don't know the address of this call. Presumably it is delete _value in the code, however libstdc++ (with delet, etc) is dynamically linked by default. You can change this by using my compiler flags, or do your listing inside gdb (i.e. after linking).

Back to the answer:

This is a special case where gcc programmers have made a choice not to optimize. The reason being that operators new and delete are marked as 'weak symbols' and the linker will look for alternatives, choosing a user supplied or falling back if none is found.

Here is a discussion of the rationale behind this. These operators are designed to be globally replaceable, and weak linking is one solution.

Statically linking doesn't change this, because free and malloc in glibc can still be changed at linktime.

Statically linking, or using link time optimization should use the in-built delete, however in this case, if you instead linked statically, the chance is still missed. In your original example, however, there is no such chance.

Compiling this:

#include <array>
#include <cassert>
#include <cstdlib>

void * operator new(std::size_t n) throw(std::bad_alloc)
{
  return malloc(n);
}
void operator delete(void * p) throw()
{
if(p != nullptr)
  free(p);
}

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::array<P, 4>().size() != 4)
    assert(false);
}

int main(){
foo();
}

with this; g++ -std=c++11 -O3 -static -Wa,-alh test.cpp -o test

links to libstdc++/glibc statically, so that it should know what free and malloc are, however it doesn't quite realise foo is trivial.

However, nm -gC test | grep free produces the weak symbol __free_hook, which I found an explanation for here. So the behaviour of free and malloc (and hence, operator new and delete) is actually always changeable at link time when compiling with gcc. That is what prevents the optimization - annoyingly the -fno-weak leaves those symbols there.

The above paragraph is true, but is a result of the optimization being missed, not a reason to avoid optimization.

With link time optimization it is possible gcc can be coaxed into doing this optimization, but first you must build everything else with -flto, including libstdc++.

Here is the most gcc does for me when building the example statically:

Dump of assembler code for function _Z3foov:
0x08048ef0 <+0>:    push   %esi
0x08048ef1 <+1>:    push   %ebx
0x08048ef2 <+2>:    sub    $0x24,%esp
0x08048ef5 <+5>:    movl   $0x0,0x10(%esp)
0x08048efd <+13>:   lea    0x20(%esp),%ebx
0x08048f01 <+17>:   movl   $0x0,0x14(%esp)
0x08048f09 <+25>:   lea    0x10(%esp),%esi
0x08048f0d <+29>:   movl   $0x0,0x18(%esp)
0x08048f15 <+37>:   movl   $0x0,0x1c(%esp)
0x08048f1d <+45>:   lea    0x0(%esi),%esi
0x08048f20 <+48>:   sub    $0x4,%ebx
0x08048f23 <+51>:   mov    (%ebx),%eax
0x08048f25 <+53>:   test   %eax,%eax
0x08048f27 <+55>:   je     0x8048f31 <_Z3foov+65>
0x08048f29 <+57>:   mov    %eax,(%esp)
0x08048f2c <+60>:   call   0x804dea0 <free>
0x08048f31 <+65>:   cmp    %esi,%ebx
0x08048f33 <+67>:   jne    0x8048f20 <_Z3foov+48>
0x08048f35 <+69>:   add    $0x24,%esp
0x08048f38 <+72>:   pop    %ebx
0x08048f39 <+73>:   pop    %esi
0x08048f3a <+74>:   ret  

The call to free isn't going anywhere.

If it is a problem, optimize it yourself. Templates make this easy (assuming std::array is passed as a template argument, otherwise why are you checking it's size()?).

#include <array>
#include <cassert>

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::tuple_size<std::array<P, 4> >::value != 4)
    assert(false);
}

int main(){
foo();
}

The code can be made to fail silently if std::array<P, 4> is vector or something, and fall back on you default construction method.

nm -C test outputs W operator new(unsigned int, void*) when I added #include <new>, so placement new at least can be changed link time (it is a weak symbol) - the others are special, and their eventual targets reside in libstdc++ (again, they are linked dynamically by default).