Why is clang not optimizing this with NRVO?

1.2k views Asked by At

I'm trying to reason why a reasonably good C++ 11 compiler (clang) is not optimizing this code, and wondering if anybody here has opinions.

#include <iostream>
#define SLOW

struct A {
  A() {}
  ~A() { std::cout << "A d'tor\n"; }
  A(const A&) { std::cout << "A copy\n"; }
  A(A&&) { std::cout << "A move\n"; }
  A &operator =(A) { std::cout << "A copy assignment\n"; return *this; }
};

struct B {
  // Using move on a sink. 
  // Nice talk at Going Native 2013 by Sean Parent.
  B(A foo) : a_(std::move(foo)) {}  
  A a_;
};

A MakeA() {
  return A();
}

B MakeB() {  
 // The key bits are in here
#ifdef SLOW
  A a(MakeA());
  return B(a);
#else
  return B(MakeA());
#endif
}

int main() {
  std::cout << "Hello World!\n";
  B obj = MakeB();
  std::cout << &obj << "\n";
  return 0;
}

If I run this with #define SLOW commented out and optimized with -s I get

Hello World!
A move
A d'tor
0x7fff5fbff9f0
A d'tor

which is expected.

If I run this with #define SLOW enabled and optimized with -s I get:

Hello World!
A copy
A move
A d'tor
A d'tor
0x7fff5fbff9e8
A d'tor

Which obviously isn't as nice. So the question is:

Why am I not seeing a NRVO optimization applied in the "SLOW" case? I know that the compiler is not required to apply NRVO, but this would seem to be such a common simple case.

In general I try to encourage code of the "SLOW" style because I find it much easier to debug.

2

There are 2 answers

3
Dietmar Kühl On BEST ANSWER

The simple answer is: because it is not allowed to apply copy elision in this case. The compiler is only allowed under very few and specific cases to apply copy elision. The quote from the standard is 12.8 [class.copy] paragraph 31:

... This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):

  • in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
  • [...]

Clearly the type of B(a) is not A, i.e., copy elision isn't permitted. The other bullets in the same paragraph refer to things like throw expressions, eliding copies from a temporary, and exception declaration. None of these apply.

0
Mihai On

The copy that you see in the slow path is not caused by lack of RVO, but by the fact that in B(MakeA()), "MakeA()" is an rvalue, but in B(a) "a" is an lvalue.

To make this clear let's modify the slow path to indicate where MakeA() is complete:

#ifdef SLOW
  A a(MakeA());
  std::cout << "---- after call \n";
  return B(a);
#else

The output is:

Hello World!
---- after call 
A copy
A move
A d'tor
A d'tor
0x7fff5a831b28
A d'tor

Which shows that no copy was done in

A a(MakeA());

Thus, RVO did happen.

The fix, which removes all copy, is:

return B(std::move(a));