Haskell IORef usage in concurrent setting

200 views Asked by At

I am having hard time to understand Haskell's memory model. I got stuck implementing a simple lock-free hash map and while trying to reproduce the error on a smaller example I realized I have no idea what is going on. I am only targeting x86_64

data TicketLock = TicketLock {cur :: !(IORef Int), next :: !(IORef Int)}

testTicket = do
  l <- TicketLock <$> newIORef 0 <*> newIORef 0
  mapConcurrently_ (\x -> replicateM_ 500000 (lockunlock l)) [0 .. 3]
  print "finished"

lockunlock :: TicketLock -> IO ()
lockunlock (TicketLock {..}) = do
  myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
  let wait = do
        x <- readIORef cur -- atomicModifyIORef' cur (\a -> (a, a)) works correctly
        unless (x == myticket) wait
  wait
  print "lock acquired" -- to observe it gets stuck  
  atomicModifyIORef' cur (\a -> (a + 1, ())) -- unlock

The test gets stuck after a couple of thousand iterations.

What is going wrong with this code? I kind of expect it to fail because I guess iorefs can be reordered by compiler but i can't figure out what exactly is going on. Is it observable in generated STG code or cmm code?

My real question is why adding barriers does not fix the problem? I tried adding barriers (from atomics-primops package) after every line of the lockunlock function and problem still continues.

I also tried using AtomicCounter from atomic-primops package, the problem is the same, even though incrCounter is implemented by fetch-add instruction the test somehow gets stuck after couple of thousand iterations, even with barriers all over the code.

data BetterTicketLock = BetterTicketLock {curb :: !(AtomicCounter), nextb :: !(AtomicCounter)}

testTicketb = do
  l <- BetterTicketLock <$> newCounter 1 <*> newCounter 0
  mapConcurrently_ (\x -> replicateM_ 500000 (lockunlockb l)) [0 .. 3]
  print "finished"

lockunlockb :: BetterTicketLock -> IO ()
lockunlockb (BetterTicketLock {..}) = do
  myticket <- incrCounter 1 nextb -- it returns new value not old value
  let wait = do
        x <- incrCounter 0 curb -- I am not even using readCounter which is similar to readIOref
        unless (x == myticket) wait
  wait
  print "lock acquired" 
  void $ incrCounter 1 curb  

Ideally i want a way to reason about haskell programs like in rust using acquire-release semantics how is the following can be converted to Haskell?

struct TicketLock { current: AtomicUsize,  next: AtomicUsize}
impl TicketLock  {
    fn new() -> Self{
            TicketLock {current: AtomicUsize::new(0),next: AtomicUsize::new(0)}   
    }
    fn lockunlock(&self) {
        let ticket = self.next.fetch_add(1, Ordering::Relaxed); 
        while self.current.load(Ordering::Acquire) != ticket { }
        // critical section
        let _ = self.current.fetch_add(1, Ordering::Release); 
    }
}

Edit: For some reason this works as expected without any performance penalty. Is it because of barriers being ignored in tight loops?

lockunlock (TicketLock {..}) s = do
  myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
  let wait = do
        x <- readIORef cur
        unless (x == myticket) $ threadDelay 0 >> wait
  wait
  modifyIORef' s (+ 1)
  atomicModifyIORef' cur (\a -> (a + 1, ())) -- unlock
1

There are 1 answers

9
Noughtmare On

You're running into a pathological case with the implementation of pre-emptive multitasking in GHC. See the documentation of the Control.Concurrent module:

GHC implements pre-emptive multitasking: the execution of threads are interleaved in a random fashion. More specifically, a thread may be pre-empted whenever it allocates some memory, which unfortunately means that tight loops which do no allocation tend to lock out other threads (this only seems to happen with pathological benchmark-style code, however).

Your wait loop seems to not allocate anything, so it cannot be pre-empted. You can solve this issue by inserting a manual yield:

lockunlock :: TicketLock -> IO ()
lockunlock (TicketLock{..}) = do
  myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
  let wait = do
        x <- readIORef cur
        unless (x == myticket) (yield *> wait)
  wait
  print "lock acquired"
  atomicModifyIORef' cur (\a -> (a + 1, ()))

Alternatively, you can compile with -fno-omit-yields.

I haven't been able to figure anything more out by trying to debug using the eventlog, because even that seems to be blocked. As K.A. Buhr remarks below he has written a better answer:

Less obviously, the same thing can happen even if you compile with -threaded and use +RTS -N options. The problem is that an uncooperative thread can starve out the runtime scheduler itself. If at some point the uncooperative thread is the only thread currently scheduled to run, it will become uninterruptible, and the scheduler will never be re-run to consider scheduling additional threads, even if they could run on other O/S threads.