Reading first row from a csv file with pipes-csv

496 views Asked by At

I am reading a csv file with pipes-csv library. I want to read first line and read the rest later. Unfortunately after Pipes.Prelude.head function returns. pipe is being closed somehow. Is there a way to read head of the csv first and read the rest later.

import qualified Data.Vector as V
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Pipes.ByteString as PB
import qualified Data.Text as Text
import qualified Pipes.Csv as PCsv
import Control.Monad (forever)

showPipe :: Proxy () (Either String (V.Vector Text.Text)) () String IO b
showPipe = forever $ do
    x::(Either String (V.Vector Text.Text)) <- await
    yield $ show x


main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  headers <- P.head producer
                  putStrLn "Header"
                  putStrLn $ show headers
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )

If we do not read the header first, we can read whole csv without any problem:

main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )
1

There are 1 answers

2
Michael On BEST ANSWER

Pipes.Csv has material for handling headers, but I think that this question is really looking for a more sophisticated use of Pipes.await or else Pipes.next. First next:

>>> :t Pipes.next 
Pipes.next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))

next is the basic way of inspecting a producer. It is sort of like pattern matching on a list. With a list the two possibilities are [] and x:xs - here they are Left () and Right (headers, rows). The latter pair is what you are looking for. Of course an action (here in IO) is needed to get one's hands on it:

main :: IO ()
main = do
  handle <- IO.openFile  "./test.csv" IO.ReadMode
  let producer :: Producer (V.Vector Text.Text) IO ()
      producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
  e <- next producer
  case e of
    Left () -> putStrLn "No lines!"
    Right (headers, rows) -> do
      putStrLn "Header"
      print headers
      putStrLn $ "Rows"
      runEffect ( rows >-> P.print)
  IO.hClose handle

Since the Either values are distraction here, I eliminate Left values - the lines that don't parse - with P.concat

next does not act inside a pipeline, but directly on the Producer, which it treats as a sort of "effectful list" with a final return value at the end. The particular effect we got above can of course be achieved with await, which acts inside a pipeline. I can use it to intercept the first item that comes along in a pipeline, do some IO based on it, and then forward the remaining elements:

main :: IO ()
main = do
  handle <- IO.openFile  "./grades.csv" IO.ReadMode
  let producer :: Producer (V.Vector Text.Text) IO ()
      producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
      handleHeader :: Pipe (V.Vector Text.Text) (V.Vector Text.Text) IO ()
      handleHeader = do
        headers <- await  -- intercept first value
        liftIO $ do       -- use it for IO
          putStrLn "Header"
          print headers
          putStrLn $ "Rows"
        cat               -- pass along all later values
  runEffect (producer >-> handleHeader >-> P.print)
  IO.hClose handle

The difference is just that if producer is empty, I won't be able to declare this, as I do with No lines! in the previous program.

Note by the way that showPipe can be defined as P.map show, or simply as P.show (but with the specialized type you add.)