Haskell return lazy string from file IO

509 views Asked by At

Here I'm back again with a (for me) really strange behaviour of my newest masterpiece...

This code should read a file, but it doesn't:

readCsvContents :: String -> IO ( String )
readCsvContents fileName = do
     withFile fileName ReadMode (\handle -> do
          contents <- hGetContents handle
          return contents
          )

main = do
    contents <- readCsvContents "src\\EURUSD60.csv"
    putStrLn ("Read " ++ show (length contents) ++ " Bytes input data.")

The result is

Read 0 Bytes input data.

Now I changed the first function and added a putStrLn:

readCsvContents :: String -> IO ( String )
readCsvContents fileName = do
     withFile fileName ReadMode (\handle -> do
          contents <- hGetContents handle
          putStrLn ("hGetContents gave " ++ show (length contents) ++ " Bytes of input data.")
          return contents
          )

and the result is

hGetContents gave 3479360 Bytes of input data.
Read 3479360 Bytes input data.

WTF ??? Well, I know, Haskell is lazy. But I didn't know I had to kick it in the butt like this.

2

There are 2 answers

0
leftaroundabout On BEST ANSWER

You're right, this is a pain. Avoid using the old standard file IO module, for this reason – except to simply read an entire file that won't change, as you did; this can be done just fine with readFile.

readCsvContents :: Filepath -> IO String
readCsvContents fileName = do
   contents <- readFile fileName
   return contents

Note that, by the monad laws, this is exactly the same1 as

readCsvContents = readFile

The problem with what you tried is that the handle is closed unconditionally when the monad exits withFile, without checking whether lazy-evaluation of contents has actually forced the file reads. That is of course horrible; I would never bother to use handles myself. readFile avoids the problem by linking the closing of the handle to garbage-collection of the original result thunk; this isn't altogether nice either but often works quite well.

For proper work with file IO, check out either the conduit or pipes library. The former focuses a bit more on performance, the latter more on elegance (but really, the difference isn't that big).


1And your first try is the same as readCsvContents fn = withFile fn ReadMode hGetContents.

0
Lubomír Sedlář On

This is a problem with lazy IO. What happens in your code is that withFile opens the file, passes the handle to the lambda. This lambda returns a lazy list containing the contents of the file. Then withFile notices that the callback finished and closes the file.

Since the returned list is lazy, the file contents will only be read when the list is evaluated. This happens in the call to length. However, at this point the file handle is already closed and therefore you can't read anything from the file.

The modified version of your call forces the file contents in the withFile argument, at which point the file is still available, and therefore it works.