What is the difference between unsafeDupablePerformIO and accursedUnutterablePerformIO?

2.4k views Asked by At

I was wandering in the Restricted Section of the Haskell Library and found these two vile spells:

{- System.IO.Unsafe -}
unsafeDupablePerformIO  :: IO a -> a
unsafeDupablePerformIO (IO m) = case runRW# m of (# _, a #) -> a

{- Data.ByteString.Internal -}
accursedUnutterablePerformIO :: IO a -> a
accursedUnutterablePerformIO (IO m) = case m realWorld# of (# _, r #) -> r

The actual difference seems to be just between runRW# and ($ realWorld#), however. I have some basic idea of what they are doing, but I don't get the real consequences of using one over another. Could somebody explain me what is the difference?

1

There are 1 answers

0
K. A. Buhr On BEST ANSWER

Consider a simplified bytestring library. You might have a byte string type consisting of a length and an allocated buffer of bytes:

data BS = BS !Int !(ForeignPtr Word8)

To create a bytestring, you would generally need to use an IO action:

create :: Int -> (Ptr Word8 -> IO ()) -> IO BS
{-# INLINE create #-}
create n f = do
  p <- mallocForeignPtrBytes n
  withForeignPtr p $ f
  return $ BS n p

It's not all that convenient to work in the IO monad, though, so you might be tempted to do a little unsafe IO:

unsafeCreate :: Int -> (Ptr Word8 -> IO ()) -> BS
{-# INLINE unsafeCreate #-}
unsafeCreate n f = myUnsafePerformIO $ create n f

Given the extensive inlining in your library, it would be nice to inline the unsafe IO, for best performance:

myUnsafePerformIO :: IO a -> a
{-# INLINE myUnsafePerformIO #-}
myUnsafePerformIO (IO m) = case m realWorld# of (# _, r #) -> r

But, after you add a convenience function for generating singleton bytestrings:

singleton :: Word8 -> BS
{-# INLINE singleton #-}
singleton x = unsafeCreate 1 (\p -> poke p x)

you might be surprised to discover that the following program prints True:

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

import GHC.IO
import GHC.Prim
import Foreign

data BS = BS !Int !(ForeignPtr Word8)

create :: Int -> (Ptr Word8 -> IO ()) -> IO BS
{-# INLINE create #-}
create n f = do
  p <- mallocForeignPtrBytes n
  withForeignPtr p $ f
  return $ BS n p

unsafeCreate :: Int -> (Ptr Word8 -> IO ()) -> BS
{-# INLINE unsafeCreate #-}
unsafeCreate n f = myUnsafePerformIO $ create n f

myUnsafePerformIO :: IO a -> a
{-# INLINE myUnsafePerformIO #-}
myUnsafePerformIO (IO m) = case m realWorld# of (# _, r #) -> r

singleton :: Word8 -> BS
{-# INLINE singleton #-}
singleton x = unsafeCreate 1 (\p -> poke p x)

main :: IO ()
main = do
  let BS _ p = singleton 1
      BS _ q = singleton 2
  print $ p == q

which is a problem if you expect two different singletons to use two different buffers.

What's going wrong here is that the extensive inlining means that the two mallocForeignPtrBytes 1 calls in singleton 1 and singleton 2 can be floated out into a single allocation, with the pointer shared between the two bytestrings.

If you were to remove the inlining from any of these functions, then the floating would be prevented, and the program would print False as expected. Alternatively, you could make the following change to myUnsafePerformIO:

myUnsafePerformIO :: IO a -> a
{-# INLINE myUnsafePerformIO #-}
myUnsafePerformIO (IO m) = case myRunRW# m of (# _, r #) -> r

myRunRW# :: forall (r :: RuntimeRep) (o :: TYPE r).
            (State# RealWorld -> o) -> o
{-# NOINLINE myRunRW# #-}
myRunRW# m = m realWorld#

substituting out the inline m realWorld# application with a non-inlined function call to myRunRW# m = m realWorld#. This is the minimal chunk of code that, if not inlined, can prevent the allocation calls from being lifted.

After this change, the program will print False as expected.

This is all that switching from inlinePerformIO (AKA accursedUnutterablePerformIO) to unsafeDupablePerformIO does. It changes that function call m realWorld# from an inlined expression to an equivalent noninlined runRW# m = m realWorld#:

unsafeDupablePerformIO  :: IO a -> a
unsafeDupablePerformIO (IO m) = case runRW# m of (# _, a #) -> a

runRW# :: forall (r :: RuntimeRep) (o :: TYPE r).
          (State# RealWorld -> o) -> o
{-# NOINLINE runRW# #-}
runRW# m = m realWorld#

Except, the built-in runRW# is magic. Even though it's marked NOINLINE, it is actually inlined by the compiler, but near the end of compilation after the allocation calls have already been prevented from floating.

So, you get the performance benefit of having the unsafeDupablePerformIO call fully inlined without the undesirable side effect of that inlining allowing common expressions in different unsafe calls to be floated to a common single call.

Though, truth be told, there is a cost. When accursedUnutterablePerformIO works correctly, it can potentially give slightly better performance because there are more opportunities for optimization if the m realWorld# call can be inlined earlier rather than later. So, the actual bytestring library still uses accursedUnutterablePerformIO internally in lots of places, in particular where there's no allocation going on (e.g., head uses it to peek the first byte of the buffer).