I was writing some templated code to benchmark a numeric algorithm using both floats and doubles, in order to compare against a GPU implementation.
I discovered that my floating point code was slower and after investigating using Vtune Amplifier from Intel I discovered that g++ was generating extra x86 instructions (cvtps2pd/cvtpd2ps and unpcklps/unpcklpd) to convert some intermediate results from float to double and then back again. The performance degradation is almost 10% for this application.
After compiling with the flag -Wdouble-promotion (which BTW is not included with -Wall or -Wextra), sure enough g++ warned me that the results were being promoted.
I reduced this to a simple test case shown below. Note that the ordering of the c++ code affects the generated code. The compound statement (T d1 = log(r)/r;) produces a warning, whilst the separated version does not (T d = log(r); d/=r;).
The following was compiled with both g++-4.6.3-1ubuntu5 and g++-4.7.3-2ubuntu1~12.04 with the same results.
Compile flags are:
g++-4.7 -O2 -Wdouble-promotion -Wextra -Wall -pedantic -Werror -std=c++0x test.cpp -o test
#include <cstdlib>
#include <iostream>
#include <cmath>
template <typename T>
T f()
{
T r = static_cast<T>(0.001);
// Gives no double promotion warning
T d = log(r);
d/=r;
// Promotes to double
T d1 = log(r)/r;
return d+d1;
}
int main()
{
float f1 = f<float>();
std::cout << f1 << std::endl;
}
I realise that the c++11 standard allows the compiler discretion here. But why does the order matter?
Can I explicitly instruct g++ to use floats only for this calculation?
EDIT: SOLVED by Mike Seymour. Needed to use std::log to ensure picking up the overloaded version of log instead of calling the C double log(double)
. The warning was not generated for the separated statement because this is a conversion and not a promotion.
The problem is
In this implementation, it seems that the only
log
in the global namespace is the C library function,double log(double)
. Remember that it's not specified whether or not the C-library headers in the C++ library dump their definitions into the global namespace as well asnamespace std
.You want
to ensure that the extra overloads defined by the C++ library are available.