I wanted to try and atomically reset 256 bits using something like this:
#include <x86intrin.h>
#include <iostream>
#include <array>
#include <atomic>
int main(){
std::array<std::atomic<__m256i>, 10> updateArray;
__m256i allZeros = _mm256_setzero_si256();
updateArray[0].fetch_and(allZeros);
}
but I get compiler errors about the element not having fetch_and()
. Is this not possible because 256 bit type is too large to guarantee atomicity?
Is there any other way I can implement this? I am using GCC.
If not, what is the largest type I can reset atomically- 64 bits?
EDIT: Could any AVX instructions perform the fetch-AND atomically?
So there are a few different things that need to be solved:
For #1 and #2:
In x86, there are instructions to do 8, 16, 32, 64, 128, 256 and 512 bit operations. One processor will [at least if the data is aligned to it's own size] perform that operation atomically. However, for an operation to be "true atomic", it also needs to prevent race conditions within the update of that data [in other words, prevent some other processor from reading, modifying and writing back that same location]. Aside from a small number of "implied lock" instructions, this is done by adding a "lock prefix" to a particular instruction - this will perform the right kind of cache-talk [technical term] to the other processors in the system to ensure that ONLY THIS processor can update this data.
We can't use VEX instructions with LOCK prefix (from Intel's manual)
You need a VEX prefix to use AVX instructions, and #UD means "undefined instruction" - in other words, the code will cause a processor exception if we try to execute it.
So, it is 100% certain that the processor can not do an atomic operation on 256 bits at a time. This answer discusses SSE instruction atomicity: SSE instructions: which CPUs can do atomic 16B memory operations?
#3 is pretty meaningless if the instruction isn't valid.
#4 - well, the standard supports
std::atomic<uintmax_t>
, and ifuintmax_t
happens to be 128 or 256 bits, then you could certainly do that. I'm not aware of any processor supporting 128 or higher bits foruintmax_t
, but the language doesn't prevent it.If the requirement for "atomic" isn't as strong as "need to ensure 100% certainly that no other processor updates this at the same time", then using regular SSE, AVX or AVX512 instructions would suffice - but there will be race conditions if you have two processor(cores) doing read/modify/write operations on the same bit of memory simultaneously.
The largest atomic operation on x86 is CMPXCHG16B, which will swap two 64-bit integer registers with the content in memory if the value in two other registers MATCH the value in memory. So you could come up with something that reads one 128-bit value, ands out some bits, and then stores the new value back atomically if nothing else got in there first - if that happened, you have to repeat the operation, and of course, it's not a single atomic and-operation either.
Of course, on other platforms than Intel and AMD, the behaviour may be different.