I tried to compare the overhead of std::visit
(std::variant
polymorphism) and virtual function(std::unique_ptr
polymorphism).(please note my question is not about overhead or performance, but optimization.)
Here is my code.
https://quick-bench.com/q/pJWzmPlLdpjS5BvrtMb5hUWaPf0
#include <memory>
#include <variant>
struct Base
{
virtual void Process() = 0;
};
struct Derived : public Base
{
void Process() { ++a; }
int a = 0;
};
struct VarDerived
{
void Process() { ++a; }
int a = 0;
};
static std::unique_ptr<Base> ptr;
static std::variant<VarDerived> var;
static void PointerPolyMorphism(benchmark::State& state)
{
ptr = std::make_unique<Derived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
ptr->Process();
}
}
BENCHMARK(PointerPolyMorphism);
static void VariantPolyMorphism(benchmark::State& state)
{
var.emplace<VarDerived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
std::visit([](auto&& x) { x.Process();}, var);
}
}
BENCHMARK(VariantPolyMorphism);
I know it's not good benchmark test, it was only draft during my test.
But I was surprised at the result.
std::visit
benchmark was high(which means slow) without any optimization.
But When I turn on optimization (higher than O2), std::visit
benchmark is extremely low(which means extremely fast) while std::unique_ptr
isn't.
I'm wondering why the same optimization can't be applied to the std::unique_ptr
polymorphism?
I've compiled your code with Clang++ to LLVM (without your benchmarking) with
-Ofast
. Here's what you get forVariantPolyMorphism
, unsurprisingly:On the other hand,
PointerPolyMorphism
does really execute the loop and all calls:The reason for this is that both your variables are static. This allows the compiler to infer that no code outside the translation unit has access to your variant instance. Therefore your loop doesn't have any visible effect and can be safely removed. However, although your smart pointer is
static
, the memory it points to could still change (as a side-effect of the call to Process, for example). The compiler can therefore not easily prove that is safe to remove the loop and doesn't.If you remove the static from both
VariantPolyMorphism
you get:Which isn't surprising once again. The variant can only contain
VarDerived
so nothing needs to be computed at run-time: The final state of the variant can already be determined at compile-time. The difference, though, now is that some other translation unit might want to access the value ofvar
later on and the value must therefore be written.