Interlocked.Increment vs lock in debug vs release mode

575 views Asked by At

I was testing how Interlocked.Increment and lock behave on my computer's architecture because I read the following lines in this article.

As rewritten with Interlocked.Increment, the method should execute faster, at least on some architectures.

Using the following code I get convinced that it's worth to review locks in my projects.

var watch = new Stopwatch();
var locker = new object();
int counter = 0;

watch.Start();
for (int i = 0; i < 100000000; i++)
{
    lock (locker)
    {
        counter++;
    }
}
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds);

watch.Reset();
counter = 0;

watch.Start();
for (int i = 0; i < 100000000; i++)
{
    Interlocked.Increment(ref counter);
}
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds);

I'm getting stable results with approximate values 2.4s for locking and 1.2s for Interlocked. However I was surprised to discover that running this code in release mode improves value only for Interlocked to approximately 0.7s and the locking time remains the same. Why is that? How is Interlocked optimized when in release mode that lock is not?

1

There are 1 answers

0
Hans Passant On BEST ANSWER

You have to look at the generated machine code to see the difference, Debug + Windows + Disassembly. The debug build version of the Interlocked.Increment() call:

   00FC27AD  call        7327A810 

The release build version:

   025F279D  lock inc    dword ptr [ebp-24h] 

Or in other words, the jitter optimizer got really smart in the Release build and replaced a call to a helper function into a single machine instruction.

Optimization just doesn't get better than that. The same optimization cannot be applied to the Monitor.Enter() method call that's underneath the lock statement, it is a pretty substantial function that's implemented in the CLR and cannot be inlined. It does many things beyond Interlocked.Increment(), it allows the operating system to reschedule when a thread blocks on trying to acquire the monitor and maintains a queue of waiting threads. That can be pretty important to ensure good concurrency, just not in your test code since the lock is entirely uncontested. Beware of synthetic benchmarks that don't approximate actual usage.