I was wandering in the Restricted Section of the Haskell Library and found these two vile spells:
{- System.IO.Unsafe -}
unsafeDupablePerformIO :: IO a -> a
unsafeDupablePerformIO (IO m) = case runRW# m of (# _, a #) -> a
{- Data.ByteString.Internal -}
accursedUnutterablePerformIO :: IO a -> a
accursedUnutterablePerformIO (IO m) = case m realWorld# of (# _, r #) -> r
The actual difference seems to be just between runRW#
and ($ realWorld#)
, however. I have some basic idea of what they are doing, but I don't get the real consequences of using one over another. Could somebody explain me what is the difference?
Consider a simplified bytestring library. You might have a byte string type consisting of a length and an allocated buffer of bytes:
To create a bytestring, you would generally need to use an IO action:
It's not all that convenient to work in the IO monad, though, so you might be tempted to do a little unsafe IO:
Given the extensive inlining in your library, it would be nice to inline the unsafe IO, for best performance:
But, after you add a convenience function for generating singleton bytestrings:
you might be surprised to discover that the following program prints
True
:which is a problem if you expect two different singletons to use two different buffers.
What's going wrong here is that the extensive inlining means that the two
mallocForeignPtrBytes 1
calls insingleton 1
andsingleton 2
can be floated out into a single allocation, with the pointer shared between the two bytestrings.If you were to remove the inlining from any of these functions, then the floating would be prevented, and the program would print
False
as expected. Alternatively, you could make the following change tomyUnsafePerformIO
:substituting out the inline
m realWorld#
application with a non-inlined function call tomyRunRW# m = m realWorld#
. This is the minimal chunk of code that, if not inlined, can prevent the allocation calls from being lifted.After this change, the program will print
False
as expected.This is all that switching from
inlinePerformIO
(AKAaccursedUnutterablePerformIO
) tounsafeDupablePerformIO
does. It changes that function callm realWorld#
from an inlined expression to an equivalent noninlinedrunRW# m = m realWorld#
:Except, the built-in
runRW#
is magic. Even though it's markedNOINLINE
, it is actually inlined by the compiler, but near the end of compilation after the allocation calls have already been prevented from floating.So, you get the performance benefit of having the
unsafeDupablePerformIO
call fully inlined without the undesirable side effect of that inlining allowing common expressions in different unsafe calls to be floated to a common single call.Though, truth be told, there is a cost. When
accursedUnutterablePerformIO
works correctly, it can potentially give slightly better performance because there are more opportunities for optimization if them realWorld#
call can be inlined earlier rather than later. So, the actualbytestring
library still usesaccursedUnutterablePerformIO
internally in lots of places, in particular where there's no allocation going on (e.g.,head
uses it to peek the first byte of the buffer).