Correctly parsing line indentations in uu-parsinglib in Haskell

256 views Asked by At

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:

Consume a line - if its indentation is:

  • ok -> do it for next lines
  • wrong -> fail

Lets consider following code:

import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)

-- end of line
pEOL   = pSym '\n'

pSpace = pSym ' '
pTab   = pSym '\t'

indentOf s = case s of
    ' '  -> 1
    '\t' -> 4

-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0

-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL

-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
    (lind, expr) <- pIndentLine p
    if lind < i
        then pFail
        else do
            rest <- myParse p i `opt` []
            return $ expr:rest

-- sample inputs
s1 = " a\
   \\n a\
   \\n"

s2 = " a\
   \\na\
   \\n"

-- execution
pProgram = myParse (pSym 'a') 1 

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    print $ parse pProgram s1
    print $ parse pProgram s2
    return ()

Which gives following output:

("aa",[])
Test.hs: no correcting alternative found

The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?

1

There are 1 answers

4
Doaitse Swierstra On

The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.

In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.

Maybe you do not want it to fail, but you want to stop accepting further input? In that case replace pFail by return [].

Note that the text:

do
    rest <- myParse p i `opt` []
    return $ expr:rest

can be replaced by (expr:) <$> (myParse p i `opt` [])

A natural way to solve your problem is probably something like

pIndented p = do i <- pGetIndent
             (:) <$> p <* pEOL  <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)

pIndent = length <$> pMany (pSym ' ')