How do ldrex / strex make atomic_add in ARM an atomic operation?

3.4k views Asked by At

As per http://lxr.free-electrons.com/source/arch/arm/include/asm/atomic.h#L31

 static inline void atomic_add(int i, atomic_t *v)
 41 {
 42         unsigned long tmp;
 43         int result;
 44 
 45         prefetchw(&v->counter);
 46         __asm__ __volatile__("@ atomic_add\n"
 47 "1:     ldrex   %0, [%3]\n"
 48 "       add     %0, %0, %4\n"
 49 "       strex   %1, %0, [%3]\n"
 50 "       teq     %1, #0\n"
 51 "       bne     1b"
 52         : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 53         : "r" (&v->counter), "Ir" (i)
 54         : "cc");
 55 }

How can it be called "atomic" when it can be preempted?

2

There are 2 answers

2
gnasher729 On

You seem to be totally misunderstanding what an atomic operation is.

It should be obviously that if you look at a value x, see the value is 13, then call an atomic_add function that increases it by 5, the new result could be anything, because another thread could have changed the value before atomic_add was called. Likewise, if you check the result, it could again be anything because another thread could change the result between the atomic_add and your checking.

An atomic_add function guarantees to leave the value increased by that amount. And that's what it does. How this is achieved doesn't matter. If hundred threads call atomic_add (5, &x), then x will end up increased by 500. That's what matters.

That's the typical method how atomic operations are performed on processors like ARM and PowerPC that have an instruction that reserves a memory location and a store instruction checking that a reservation still exists.

13
artless noise On

The ldrex and strex take a different tactic for implementing atomic instructions. The traditional compare and swap (CAS for short) has limitations in lock-free programming. The load link/store conditional (LL/SC for short) is a more advanced form of atomic instruction that allows atomic linked lists; compare them and think about how they might be implemented in silicon.

For the traditional ARM, the swp and swpb instructions provided an atomic mechanism. This seems obvious as the instruction stands by itself. The complication is in multi-cpu designs. The CPU running the swp must lock the bus so that other CPUs can not read or write the memory as the update is being performed. Typically, this would be the whole BUS and not just a single location. The whole BUS mechanism means that Adhmal's law applies.

As per Masta79, Atomic does not mean "in one cycle", is a distinguishing feature of the ARM ldrex/strex atomic support. A particular reserve granule is locked on the ARM bus for the duration of the ldrex/strex pair. This allows for many different complex lock free primitives; many valid instructions are permitted between the ldrex/strex pairs. However, it come with the additional complication that the strex supports a retry status via the condition code being set. This is a definite shift in mind-set from traditional atomic operations. With specific reservations, it is possible for each CPU to make progress, if they are not trying to lock the same reserve granual (a specific chunk of memory) at the same time. This should help in avoiding the Adhmal road block (aka memory bandwidth with bus locking).

how can it be called as atomic when it can be preempted?

An important feature of the code is prefetchw(&v->counter);, which brings the value into the cache. The cache is treated as a temporary buffer and a successful strex will commit it. Only the cached value is modified until the strex; if the strex finds it is dirty (another committed it), then the value is thrown away. An interrupt on the same CPU will do a clrex, which also invalidates the data and makes the strex retry.