IO vs referential transparency

1.2k views Asked by At

Sorry newb question here, but how does Haskell know not to apply referential transparency to e.g. readLn or when putStrLn-ing a same string twice? Is it because IO is involved? IOW, will not the compiler apply referential transparency to functions returning IO?

4

There are 4 answers

1
Anler On BEST ANSWER

The IO type is defined as:

newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))

Notice is very similar to the State Monad where the state is the state of the real world, so imho you can think of IO as referentially transparent and pure, what's impure is the Haskell runtime (interpreter) that runs your IO actions (the algebra).

Take a look at the Haskell wiki, it explains IO in greater detail: IO Inside

0
Davislor On

Sort of. You’ve probably heard that IO is a monad, which means a value wrapped in it has to use monadic operations, such as bind (>>=), sequential composition (>>) and return. So, you could write a greeting program like this:

prompt :: IO String
prompt = putStrLn "Who are you?" >>
         getLine >>=
         \name ->
           return $ "Hello, " ++ name ++ "!"

main :: IO ()
main = prompt >>=
       putStrLn

You’re more likely to see this with the equivalent do notation, which is just another way of writing the exact same program. In this case, though, I think the unsugared version makes it clearer that the computation is a series of statements chained together with >> and >>=, where we use >> when we want to throw out the result of the previous step, and >>= when we want to pass a result to the next function in the chain. If we need to give that result a name, we can capture it as a parameter, like in the lambda expression \name -> inside prompt. If we need to lift a simple String into an IO String, we use return.

The equivalent in do notation, by the way, is:

prompt :: IO String
prompt = do
  putStrLn "Who are you?"
  name <- getLine
  return $ "Hello, " ++ name ++ "!"

main :: IO ()
main = do
  message <- prompt
  putStrLn message

So how does it know that main, which returns nothing, is not referentially transparent, and that prompt, which returns an IO String, is not either? There is something special to IO, or at least something IO lacks: with many other monads, such as State and Maybe, there is a way to do a lazy computation inside the monad and discard the wrapper, getting a pure value back out. You can declare a State Int monad, do deterministic, sequential, stateful computations inside it for a while, then use evalState to get the pure Int result back. You can do a Maybe Char computation, like searching for a character in a string, check that it worked, and if so, read the pure Char back out.

With IO, you cannot do this. If you have an IO String, all you can do with it is bind it to an IO function that takes a String argument, such as PutStrLn, or pass it to a function that takes an IO String argument. If you call prompt a second time, it won’t silently give you the same result; it will actually run again. If you tell it to pause for a few milliseconds, it won’t lazily wait until you need some return value later in the program to do that. If it returns an empty value like IO (), the compiler will not optimize it by just returning that constant.

The way this works internally is to wrap the object together with a state-of-the-world parameter that is different for each call. That means that two different calls to getLine depend on different states of the world, and the return value of main requires computing the final state of the world, which depends on all previous IO operations.

0
arrowd On

Due to return values are wrapped into IO, you can't reuse them unless you "pull" them out, effectively running the IO action:

readLn :: IO String

twoLines = readLn ++ readLn -- can't do this, because (++) works on String's, not IO String's

twoLines' = do
  l1 <- readLn
  l2 <- readLn -- "pulling" out of readLn causes console input to be read again
  return (l1 ++ l2) -- so l1 and l2 have different values, and this works
0
MathematicalOrchid On

You need to distinguish between evaluation and execution.

If you evaluate 2 + 7, the result is 9. If you replace one expression that evaluates to 9 with another, different expression that also evaluates to 9, then the meaning of the program has not changed. This is what referential transparency guarantees. We could common up several shared expressions that reduce to 9, or duplicate a shared expression into multiple copies, and the meaning of the program does not change. (Performance might, but not the end result.)

If you evaluate readLn, it evaluates to an I/O command object. You can imagine it as being a data structure that describes what I/O operation(s) you want performed. But the object itself is just data. If you evaluate readLn twice, it returns the same I/O command object twice. You can merge several copies into one; you can split one copy into several. It doesn't change the meaning of the program.

Now if you want to execute the I/O action, that's a different thing. Clearly, I/O operations need to be executed in exactly the way the program specifies, without being randomly duplicated or rearranged. But that's OK, because it's not the Haskell expression evaluation engine that does that. You can pretend that the Haskell runtime runs main, which builds a giant I/O command object representing the entire program. The Haskell runtime then reads this data structure and executes the I/O operations it requests, in the order specified. (Not actually how it works, but a useful mental model.)

Ordinarily you don't need to bother thinking about this strict separation between evaluating readLn to get an I/O command object and then executing the resulting I/O command object to get a result. But strictly that's notionally what it does.

(You may also have heard that I/O "forms a monad". That's merely a fancy way of saying that there's a particular set of operators for changing I/O command objects together into bigger I/O command objects. It's not central to understanding the separation between evaluate and execute.)