What is the fastest method, to compare two u_int64[8]
arrays in C/C++ ?
Array 1 is inside a std::vector
(~10k elements) array 2 is inside a dynamic allocated struct. (is memcmp()
here false positive free?)
My (pseudo C) implementation :
typedef struct {
u_int64_t array[8];
}work_t;
/* alloc and fill array work_t* work = new (std::nothrow) work_t etc... */
for(u_int32_t i=0; i < some_std_vector.size(); i++) {
if((some_std_vector[i]->array[0] == work->array[0]) &&
(some_std_vector[i]->array[1] == work->array[1]) &&
(some_std_vector[i]->array[2] == work->array[2]) &&
(some_std_vector[i]->array[3] == work->array[3]) &&
(some_std_vector[i]->array[4] == work->array[4]) &&
(some_std_vector[i]->array[5] == work->array[5]) &&
(some_std_vector[i]->array[6] == work->array[6]) &&
(some_std_vector[i]->array[7] == work->array[7])) {
//...do some stuff...
}
}
The target platform is Linux x86_64 gcc 4.9.2, the loop is inside a pthread
, tcmalloc
is used, and the code is compiled with -O2
Here are some suggestions to improve the speed.
Use Local Variables if possible
Instead of using pointers, e.g. -> operator, use local variables or pass the variables as references. The compiler may generate extra code for loading a pointer into a register then dereferencing the the register to get the value.
Use Processor's Data Cache Most modern processors have a data cache. If you can load several variables with the data, then compare, you may invoke the processor's data cache.
Also, design your data to fit efficiently into a data cache line. This means that data members (arrays included) should be next to each other or very close.
Block Compare
At the lowest level you are comparing many consecutive bytes. As other's have mentioned, you may get better performance by using a memory compare function.
Another suggestion is to help the compiler by loading the values into separate variables, the comparing the values:
The concept here is to load the variables first into multiple registers and then compare the registers.
Review Assembly Language & Profile
With all of the techniques presented in the answers, the best method is to code one up, review the assembly language and profile. Remember to set the optimization levels to high for speed.
If your process has special instructions that can make this faster, you want to verify that the compiler is using them or there is justification for not using them.