Consider this code:
// foo.cxx
int last;
int next() {
return ++last;
}
int index(int scale) {
return next() << scale;
}
When compiling with gcc 7.2:
$ g++ -std=c++11 -O3 -fPIC
This emits:
next():
movq last@GOTPCREL(%rip), %rdx
movl (%rdx), %eax
addl $1, %eax
movl %eax, (%rdx)
ret
index(int):
pushq %rbx
movl %edi, %ebx
call next()@PLT ## next() not inlined, call through PLT
movl %ebx, %ecx
sall %cl, %eax
popq %rbx
ret
However, when compiling the same code with the same flags using clang 3.9 instead:
next(): # @next()
movq last@GOTPCREL(%rip), %rcx
movl (%rcx), %eax
incl %eax
movl %eax, (%rcx)
retq
index(int): # @index(int)
movq last@GOTPCREL(%rip), %rcx
movl (%rcx), %eax
incl %eax ## next() was inlined!
movl %eax, (%rcx)
movl %edi, %ecx
shll %cl, %eax
retq
gcc calls next()
via the PLT, clang inlines it. Both still lookup last
from the GOT. For compiling on linux, is clang right to make that optimization and gcc is missing out on easy inlining, or is clang wrong to make that optimization, or is this purely a QoI issue?
I don't think the standard goes into that much detail. It merely says that roughly if the symbol has external linkage in different translation units, it is the same symbol. That makes clang's version correct.
From that point on, to the best of my knowledge, we're out of the standard. Compilers choices differ on what they consider a useful
-fPIC
output.Note that
g++ -c -std=c++11 -O3 -fPIE
outputs:So GCC does know how to optimize this. It just chooses not to when using
-fPIC
. But why? I can see only one explanation: make it possible to override the symbol during dynamic linking, and see the effects consistently. The technique is known as symbol interposition.In a shared library, if
index
callsnext
, asnext
is globally visible, gcc has to consider the possibility thatnext
could be interposed. So it uses the PLT. When using-fPIE
however, you are not allowed to interpose symbols, so gcc enables the optimization.So is clang wrong? No. But gcc seems to provide better support for symbol interposition, which is handy for instrumenting the code. It does so at the cost of some overhead if one uses
-fPIC
instead of-fPIE
for building his executable though.Additional notes:
In this blog entry from one of gcc developers, he mentions, around the end of the post:
Following that lead landed me on the x86-64 ABI spec. In section 3.5.5, it does mandate that all functions calling a globally visible symbols must go through the PLT (it goes as far as defining the exact instruction sequence to use depending on memory model).
So, though it does not violate C++ standard, ignoring semantic interposition seems to violate the ABI.
Last word: didn't know where to put this, but it might be of interest to you. I'll spare you the dumps, but my tests with objdump and compiler options showed that:
On the gcc side of things:
gcc -fPIC
: accesses tolast
goes through GOT, calls tonext()
goes through PLT.gcc -fPIC -fno-semantic-interposition
:last
goes through GOT,next()
is inlined.gcc -fPIE
:last
is IP-relative,next()
is inlined.-fPIE
implies-fno-semantic-interposition
On the clang side of things:
clang -fPIC
:last
goes through GOT,next()
is inlined.clang -fPIE
:last
goes through GOT,next()
is inlined.And a modified version that compiles to IP-relative, inlined on both compilers:
Basically, this explicitly marks that despite making them available globally, we use hidden version of those symbols that will ignore any kind of interposition. Both compilers then fully optimize the accesses, regardless of passed options.