I'm trying to parse a simple language defined as follows:
import Data.Functor.Identity
import Data.Text (Text)
import qualified Data.Text as T
import Text.Parsec
import qualified Text.Parsec.Expr as Expr
data G
= Low Int
| Up Int
| And G G
| Or G G
deriving stock (Eq, Show)
parseIt :: Text -> Either ParseError G
parseIt = parse defP "parseIt"
type Parser = Parsec Text ()
defP :: Parser G
defP = goP <* eof
where
goP :: Parser G
goP = Expr.buildExpressionParser table term
table :: Expr.OperatorTable Text () Identity G
table = [[binary And "&&", binary Or "||"]]
term :: Parser G
term =
choice
[ parens goP,
unary Up ">",
unary Low "<"
]
binary :: (G -> G -> G) -> String -> Expr.Operator Text () Identity G
binary func operator = Expr.Infix (string operator >> return func) Expr.AssocLeft
unary :: (Int -> G) -> String -> Parser G
unary mkSpec op = do
void $ string op
skipSpaces
mkSpec <$> numP
parens = between (symbol "(") (symbol ")")
where
symbol name = lexeme (string name)
lexeme p = do x <- p; skipSpaces; return x
skipSpaces = skipMany space
numP :: Parser Int
numP = do
xs <- many1 digit
return $ read xs
I have few test cases to exercise it:
import Control.Monad
import Test.Hspec
main :: IO ()
main = hspec spec
spec :: Spec
spec = do
describe "should be parsed" $ do
forM_
[ (">1", Up 1),
("< 42", Low 42),
(">1 && <42", Up 1 `And` Low 42),
(">1 || <2 && >5", Up 1 `Or` (Low 2 `And` Up 5)),
("((>1 || <2)) && >5", (Up 1 `Or` Low 2) `And` Up 5)
]
$ \(raw, expected :: G) ->
it (T.unpack raw) $ parseIt raw `shouldBe` Right expected
But they fail on binary operators:
should be parsed
>1 [✔]
< 42 [✔]
>1 && <42 [✘]
>1 || <2 && >5 [✘]
((>1 || <2)) && >5 [✘]
Failures:
test/Spec.hs:29:43:
1) parseVersionSpec, should be parsed, >1 && <42
expected: Right (And (Up 1) (Low 42))
but got: Left "parseIt" (line 1, column 3):
unexpected ' '
expecting digit, operator or end of input
To rerun use: --match "/parseVersionSpec/should be parsed/>1 && <42/"
test/Spec.hs:29:43:
2) parseVersionSpec, should be parsed, >1 || <2 && >5
expected: Right (Or (Up 1) (And (Low 2) (Up 5)))
but got: Left "parseIt" (line 1, column 3):
unexpected ' '
expecting digit, operator or end of input
To rerun use: --match "/parseVersionSpec/should be parsed/>1 || <2 && >5/"
test/Spec.hs:29:43:
3) parseVersionSpec, should be parsed, ((>1 || <2)) && >5
expected: Right (And (Or (Up 1) (Low 2)) (Up 5))
but got: Left "parseIt" (line 1, column 5):
unexpected " "
expecting digit, operator or ")"
To rerun use: --match "/parseVersionSpec/should be parsed/((>1 || <2)) && >5/"
Randomized with seed 1024517159
Finished in 0.0016 seconds
5 examples, 3 failures
*** Exception: ExitFailure 1
I cannot find proper examples, any help would be appreciated.
The problem is that, in parsing
">1 && <42",termparses">1", but leaves a space at the beginning of the remainder of the input stream" && <42", which causesbinaryto fail.To properly handle whitespace, you should write a set of lexemes that each expect to start parsing at non-whitespace and take responsibility for absorbing any trailing whitespace when finished, and then write the rest of your parser in terms of these lexemes only, without using non-lexeme parsers like
string.Move your
lexemeandsymboldefinitions up to top-level, or at least the level ofdefP'swhereclause:Define
numPas a lexeme:and in the rest of your parsers, make use only of the lexeme-level parsers
numPandsymbol.For example, replace
string/skipSpacesinunarywithsymbol:This is a valid lexeme parser, because it parses the lexeme
symbol opfollowed by the lexemenumP. Do the same inbinary:Also, in your top-most parser
defP, allow leading whitespace:Finally, if you actually want
&&to have higher precedence than||, you need to replace:with:
The resulting parser should pass all your tests:
Stylistically, you may also find that switching everything to consistent applicative style makes for a nicer looking parser. Given everything's in a
whereclause, I might also argue that dropping most of the type signatures would be better. They don't do much for readability: