The problem
I came across a problem today and I do not know how to solve it. It is very strange to me, because the code I've written should (according to my current knowledge) is correct.
So below you can find a sample parser combinators. The most important one is pOperator
, which in very simple way (only for demonstration purposes) builds an operator AST.
It consumes "x" and can consume multiple "x" separated by whitespaces.
I've got also pParens
combinator which is defined like:
pPacked pParenL (pWSpaces *> pParenR)
so it consumes Whitespaces before closing bracket.
Sample input / output
The correct input/output SHOULD be:
in: "(x)"
out: Single "x"
in: "(x )"
out: Single "x"
but I'm getting:
in: "(x)"
out: Single "x"
in: "(x )"
out: Multi (Single "x") (Single "x")
-- Correcting steps:
-- Inserted 'x' at position LineColPos 0 3 3 expecting one of ['\t', ' ', 'x']
but in the second example I'm getting error - and the parser behaves like it greedy eats some tokens (and there is no greedy operation).
I would be thankful for any help with it.
Sample code
import Prelude hiding(lex)
import Data.Char hiding (Space)
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import qualified Text.ParserCombinators.UU.Utils as Utils
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
data El = Multi El El
| Single String
deriving (Show)
---------- Example core grammar ----------
pElement = Single <$> pSyms "x"
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
---------- Basic combinators ----------
applyAll x (f:fs) = applyAll (f x) fs
applyAll x [] = x
pSpace = pSym ' '
pTab = pSym '\t'
pWSpace = pSpace <|> pTab
pWSpaces = pMany pWSpace
pWSpaces1 = pMany1 pWSpace
pMany1 p = (:) <$> p <*> pMany p
pSyms [] = pReturn []
pSyms (x : xs) = (:) <$> pSym x <*> pSyms xs
pParenL = Utils.lexeme $ pSym '('
pParenR = Utils.lexeme $ pSym ')'
pParens = pPacked pParenL (pWSpaces *> pParenR)
---------- Program ----------
pProgram = pParens pOperator
-- if you replace it with following line, it works:
-- pProgram = pParens pElement
-- so it seems like something in pOperator is greedy
tests = [ ("test", "(x)")
, ("test", "(x )")
]
---------- Helpers ----------
type Parser a = P (Str Char String LineColPos) a
parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)
main :: IO ()
main = do
mapM_ (\(desc, p) -> putStrLn ("\n=== " ++ desc ++ " ===") >> run pProgram p) tests
return ()
run :: Show t => Parser t -> String -> IO ()
run p inp = do let (a, errors) = parse p inp
putStrLn ("-- Result: \n" ++ show a)
if null errors then return ()
else do putStr ("-- Correcting steps: \n")
show_errors errors
putStrLn "-- "
where show_errors :: (Show a) => [a] -> IO ()
show_errors = sequence_ . (map (putStrLn . show))
IMPORTANT
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
is equivalent to:
foldr pChainl pElement (Multi <$ pWSpaces1)
according to: Combinator Parsing: A Short Tutorial
And it is used to define operator precedense.
The definition of pMany reads:
and this suggest the solution. When seeing the space we should not commit immediately to the choice to continue with more x-es so we define:
Of course you may also call pList_ng immediately. Even better would be to write:
I did not test it since I am not sure whether between x-es there should be at least one space etc.
Doaitse