Trying to understand how pointer chasing is working (basically the access pattern) in the following benchmark:
https://github.com/google/multichase/blob/master/multichase.c
For the simple chase:
static void chase_simple(per_thread_t *t) {
void *p = t->x.cycle[0]; ----> here p is pointing to initial address
do {
x200(p = *(void **)p;) ----> didn't get this statement, is x200 some compiler built-in ?
} while (__sync_add_and_fetch(&t->x.count, 200));
// we never actually reach here, but the compiler doesn't know that
t->x.dummy = (uintptr_t)p;
}
where and how in the code, access pattern is made somewhat unpredictable so that H/W prefetcher doesn't guess it properly?
I'd assume
x200is a CPP macro that repeats its operand 200 times, for loop unrolling. Readx200as "times 200". Possible implementation:That would make sense to amortize the cost of the atomic RMW to increment
t->x.count.You can check by looking at C preprocessor output, e.g.
gcc -E, or search for the macro definition in the included files.The actual statement being repeated,
p = *(void **)p;, is just dereferencing a pointer to get another pointer. Often you'd havep = p->next;if you hadstruct foo *p, but with just avoid*you need a cast to keep the compiler happy and get an assembly instruction likemov rax, [rax].Without a struct that can contain a memory that's a pointer to the same struct type, dereferencing a pointer is going to change the type. C doesn't have a way to declare an infinitely-indirect pointer that just produces the same type when you deref it.
Nowhere in this function; it's just iterating through an existing linked list.
Look for the code that allocates space and stores pointers into it. Probably allocated as one large array, so it has control over the layout, unlike if it did many small mallocs. A random shuffling of the integers 0..n-1 could be mapped transformed to an array of pointers to pointers. Or something like that; perhaps you need to invert that mapping to make sure there aren't short cycles. Or maybe there's some simpler way to generate a linked list that covers all slots in a random order, with the last element pointing to the first so it's a closed loop that can be iterated as many times as you want.