Here is the code for testing.
Tuple test:
using namespace std;
int main(){
vector<tuple<int,int>> v;
for (int var = 0; var < 100000000; ++var) {
v.push_back(make_tuple(var, var));
}
}
Pair test:
#include <vector>
using namespace std;
int main(){
vector<pair<int,int>> v;
for (int var = 0; var < 100000000; ++var) {
v.push_back(make_pair(var, var));
}
}
I did the time measurement via Linux time command. The results are:
| | -O0 | -O2 |
|:------|:-------:|:--------:|
| Pair | 8.9 s | 1.60 s |
| Tuple | 19.8 s | 1.96 s |
I am wondering, why is such a big difference between those two data structures in O0, as they should be very similar. There is just a small difference in 02.
Why is the difference in O0 so big, and why is there any difference at all?
EDIT:
The code with v.resize()
Pair:
#include <vector>
using namespace std;
int main(){
vector<pair<int,int>> v;
v.resize(100000000);
for (int var = 0; var < 100000000; ++var) {
v[var] = make_pair(var, var);
}
}
Tuple:
#include<tuple>
#include<vector>
using namespace std;
int main(){
vector<tuple<int,int>> v;
v.resize(100000000);
for (int var = 0; var < 100000000; ++var) {
v[var] = make_tuple(var, var);
}
}
Results:
| | -O0 | -O2 |
|:------|:-------:|:--------:|
| Pair | 5.01 s | 0.77 s |
| Tuple | 10.6 s | 0.87 s |
EDIT:
My system
g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
GLIBCXX_3.4.19
You are missing some crucial information: What compiler do you use? What do you use to measure the performance of the microbenchmark? What standard library implementation do you use?
My system:
Anyhow, I ran your examples, but reserved the proper size of the vectors first to get rid of the memory allocation overhead. With that, I funnily observe the opposite something interesting - the reverse of what you see:
versus:
as you can see, in my case the reason are the much higher number of stalled cycles, both in the frontend as well as in the backend.
Now where does this come from? I bet it comes down to some failed inlining, similar to what is explained here: std::vector performance regression when enabling C++11
Indeed, enabling
-flto
equalizes the results for me:and for tuple:
So remember,
-flto
is your friend and failed inlining can have extreme results on heavily templated code. Useperf stat
to find out what's happening.