Unplanned greedy behaviour in uu-parsinglib

174 views Asked by At

The problem

I came across a problem today and I do not know how to solve it. It is very strange to me, because the code I've written should (according to my current knowledge) is correct.

So below you can find a sample parser combinators. The most important one is pOperator, which in very simple way (only for demonstration purposes) builds an operator AST. It consumes "x" and can consume multiple "x" separated by whitespaces.

I've got also pParens combinator which is defined like:

pPacked pParenL (pWSpaces *> pParenR)

so it consumes Whitespaces before closing bracket.

Sample input / output

The correct input/output SHOULD be:

in: "(x)"
out: Single "x"

in: "(x )"
out: Single "x"

but I'm getting:

in: "(x)"
out: Single "x"

in: "(x )" 
out: Multi (Single "x") (Single "x")
--  Correcting steps: 
--    Inserted  'x' at position LineColPos 0 3 3 expecting one of ['\t', ' ', 'x']

but in the second example I'm getting error - and the parser behaves like it greedy eats some tokens (and there is no greedy operation).

I would be thankful for any help with it.

Sample code

import Prelude hiding(lex)
import Data.Char hiding (Space)
import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import qualified Text.ParserCombinators.UU.Utils as Utils
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)


data El = Multi El El
        | Single String
        deriving (Show)


---------- Example core grammar ----------

pElement     = Single <$> pSyms "x"
pOperator    = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)

---------- Basic combinators ----------

applyAll x (f:fs) = applyAll (f x) fs
applyAll x []     = x

pSpace    = pSym ' '
pTab      = pSym '\t'
pWSpace   = pSpace <|> pTab
pWSpaces  = pMany pWSpace
pWSpaces1 = pMany1 pWSpace
pMany1 p  = (:) <$> p <*> pMany p

pSyms []       = pReturn []
pSyms (x : xs) = (:) <$> pSym x <*> pSyms xs

pParenL     = Utils.lexeme $ pSym '('
pParenR     = Utils.lexeme $ pSym ')'
pParens     = pPacked pParenL (pWSpaces *> pParenR)

---------- Program ----------

pProgram = pParens pOperator
-- if you replace it with following line, it works:
--  pProgram = pParens pElement
-- so it seems like something in pOperator is greedy

tests = [ ("test", "(x)")
        , ("test", "(x )")
        ]

---------- Helpers ----------

type Parser a = P (Str Char String LineColPos) a

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    mapM_ (\(desc, p) -> putStrLn ("\n=== " ++ desc ++ " ===") >> run pProgram p) tests
    return ()

run :: Show t =>  Parser t -> String -> IO ()
run p inp = do  let (a, errors) =  parse p inp
                putStrLn ("--  Result: \n" ++ show a)
                if null errors then  return ()
                               else  do putStr ("--  Correcting steps: \n")
                                        show_errors errors
                putStrLn "-- "
             where show_errors :: (Show a) => [a] -> IO ()
                   show_errors = sequence_ . (map (putStrLn . show))

IMPORTANT

pOperator    = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)

is equivalent to:

foldr pChainl pElement (Multi <$ pWSpaces1)

according to: Combinator Parsing: A Short Tutorial

And it is used to define operator precedense.

1

There are 1 answers

0
Doaitse Swierstra On

The definition of pMany reads:

pMany :: IsParser p => p a -> p [a]
pMany p = pList p

and this suggest the solution. When seeing the space we should not commit immediately to the choice to continue with more x-es so we define:

pMany :: IsParser p => p a -> p [a]
pMany_ng p = pList_ng p

Of course you may also call pList_ng immediately. Even better would be to write:

pParens (pChainr_ng (pMulti <$ pWSpaces1) px) -- 

I did not test it since I am not sure whether between x-es there should be at least one space etc.

Doaitse