Lazy output from monadic action

324 views Asked by At

I have the next monad transformer:

newtype Pdf' m a = Pdf' {
  unPdf' :: StateT St (Iteratee ByteString m) a
  }
type Pdf m = ErrorT String (Pdf' m)

Basically, it uses underlying Iteratee that reads and processes pdf document (requires random-access source, so that it will not keep the document in memory all the time).

I need to implement a function that will save pdf document, and I want it to be lazy, it should be possible to save document in constant memory.

I can produce lazy ByteString:

import Data.ByteString.Lazy (ByteString)
import qualified Data.ByteString.Lazy as BS
save :: Monad m => Pdf m ByteString
save = do
  -- actually it is a loop
  str1 <- serializeTheFirstObject
  storeOffsetForTheFirstObject (BS.length str1)
  str2 <- serializeTheSecondObject
  storeOffsetForTheSecondObject (BS.length str2)
  ...
  strn <- serializeTheNthObject
  storeOffsetForTheNthObject (BS.length strn)
  table <- dumpRefTable
  return mconcat [str1, str2, ..., strn] `mappend` table

But actual output can depend on previous output. (Details: pdf document contains so called "reference table" with absolute offset in bytes of every object inside the document. It definitely depends on length of ByteString pdf object is serialized to.)

How to ensure that save function will not force entire ByteString before returning it to caller?

Is it better to take callback as an argument and call it every time I have something to output?

import Data.ByteString (ByteString)
save :: Monad m => (ByteString -> Pdf m ()) -> Pdf m ()

Is there better solution?

2

There are 2 answers

0
Yuras On BEST ANSWER

The solution I found so far is Coroutine Example:

proc :: Int -> Coroutine (Yield String) IO ()
proc 0 = return ()
proc i = do
  suspend $ Yield "Hello World\n" (proc $ i - 1)

main :: IO ()
main = do
  go (proc 10)
  where
  go cr = do
    r <- resume cr
    case r of
      Right () -> return ()
      Left (Yield str cont) -> do
        putStr str
        go cont

It does the same work as callback, but caller has full control on output generation.

2
Chris Kuklewicz On

To build this in one pass you will need to store (perhaps in the state) where your indirect objects have been written. So the save needs to keep track of the absolute byte position as it works -- I have not considered whether your Pdf monad is suitable for this task. When you get to the end you can used the addresses stored in the state to create the xref section.

I do not think a two-pass algorithm will help.

Edit June 6th: Perhaps I understand your desire better now. For very fast generation of documents, e.g. HTML, there are several libraries on hackage with "blaze" in the name. The technique is to avoid using 'mconcat' on the ByteString and use in on an intermediate 'builder' type. The core library for this seems to be 'blaze-builder', which is used in 'blaze-html' and 'blaze-textual'.