How to parse arbitrary lists with Haskell parsers?

410 views Asked by At

Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?

For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be

  • drop leading zeros
  • parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
  • return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers

It would be great if someone could show how to write such a parser (or something similar).

2

There are 2 answers

0
Hans Lub On BEST ANSWER

Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim

The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.

Anyway, here is the code (without the skipping of leading zeroes, for brevity)

import Text.Parsec

myToken ::  (Show a) => (a -> Bool) -> Parsec [a] () a
myToken test = tokenPrim show incPos $ justIf test where
  incPos pos _ _ = incSourceColumn pos 1
  justIf test x = if (test x) then Just x else Nothing

small = myToken  (< 10)
large = myToken  (>= 10)

smallLargePattern = do
  smallints <- many1 small
  largeints <- many1 large
  let prod = foldl1 (*)
  return (prod smallints, prod largeints)

myIntListParser :: Parsec [Int] () [(Int,Int)]
myIntListParser = many smallLargePattern

testMe :: [Int] -> [(Int, Int)]
testMe xs = case parse myIntListParser "your list" xs of
  Left err -> error $ show err
  Right result -> result

Trying it all out:

*Main> testMe [1,2,55,33,3,5,99]
[(2,1815),(15,99)]
*Main> testMe [1,2,55,33,3,5,99,1]
*** Exception: "your list" (line 1, column 9):
unexpected end of input

Note the awkward line/column format in the error message

Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition

0
ErikR On

There is very likely a way to get Parsec to use [a] as the stream type, but the idea behind parser combinators is actually very simple, and it's not very difficult to roll your own library.

A very accessible resource I would recommend is Monadic Parsing in Haskell by Graham Hutton and Erik Meijer.

Indeed, right now Erik Meijer is teaching an intro Haskell/functional programming course on edx.org (link) and Lecture 7 is all about functional parsers. As he states in the intro to the lecture:

"... No one can follow the path towards mastering functional programming without writing their own parser combinator library. We start by explaining what parsers are and how they can naturally be viewed as side-effecting functions. Next we define a number of basic parsers and higher-order functions for combining parsers. ..."