Using Alex and Happy on a tiny grammar for arithmetic expressions (only + and - among integers), I want to implement an eval() function, which parses as an Expr the string given as an argument.
Here's an example of the expected result:
$ stack run
1 + eval("3-2")
Plus (ConstInt 1) (Minus (ConstInt 3) (ConstInt 2))
The content of eval was parsed correctly.
This question is about understanding a bit better why the way I implemented works (I would expect NOT to work!) and whether it should be done differently.
The Lexer:
{
module Lexer where
import Control.Monad (when)
}
%wrapper "monadUserState"
$digit = 0-9
tokens :-
<0> $white+ { skip }
<0> "(" { tok TokLParen }
<0> ")" { tok TokRParen }
<0> "+" { tok TokPlus }
<0> "-" { tok TokMinus }
<0> "eval" { tok TokBuiltinEvalExpr }
<0> $digit+ { tokInt }
<0> \" { enterString `andBegin` string }
<string> \" { exitString `andBegin` 0 }
<string> \\\\ { emit '\\' }
<string> \\\" { emit '"' }
<string> \\n { emit '\n' }
<string> \\t { emit '\t' }
<string> . { emitCurrent }
{
data Token =
TokLParen
| TokRParen
| TokPlus
| TokMinus
| TokBuiltinEvalExpr
| TokConstInt Int
| TokConstStr String
| EOF
deriving (Eq, Show)
data AlexUserState = AlexUserState {
nestLevel :: Int
, strStart :: AlexPosn
, strBuffer :: String
} deriving (Show)
alexInitUserState :: AlexUserState
alexInitUserState = AlexUserState
{ nestLevel = 0
, strStart = AlexPn 0 0 0
, strBuffer = []
}
alexModifyUserState :: (AlexUserState -> AlexUserState) -> Alex ()
alexModifyUserState f = f <$> alexGetUserState >>= alexSetUserState
alexEOF :: Alex Token
alexEOF = do
(AlexPn _ line column, _, _, _) <- alexGetInput
startCode <- alexGetStartCode
when (startCode == string) $
alexError $ "Error: unclosed string at line " <> show line <> ", column " <> show column
pure EOF
tok :: Token -> AlexAction Token
tok ctor _ _ = pure ctor
tokInt :: AlexAction Token
tokInt (_, _, _, str) len =
pure $ TokConstInt $ read $ take len str
enterString, exitString :: AlexAction Token
enterString inp@(pos, _, _, _) len = do
alexModifyUserState $ \s -> s{strStart = pos, strBuffer = strBuffer s}
skip inp len
exitString _ _ = do
s <- alexGetUserState
alexSetUserState s{strStart = AlexPn 0 0 0, strBuffer = []}
pure $ TokConstStr $ reverse $ strBuffer s
emit :: Char -> AlexAction Token
emit c inp len = do
alexModifyUserState $ \s -> s{strBuffer = c : strBuffer s}
skip inp len
emitCurrent :: AlexAction Token
emitCurrent inp@(_, _, _, str) len = do
alexModifyUserState $ \s -> s{strBuffer = head str : strBuffer s}
skip inp len
}
The Parser:
{
module Parser where
import qualified Lexer as L
}
%name parser expr
%tokentype { L.Token }
%error { parseError }
%monad { L.Alex } { >>= } { pure }
%lexer { lexer } { L.EOF }
%token
'(' { L.TokLParen }
')' { L.TokRParen }
'+' { L.TokPlus }
'-' { L.TokMinus }
'eval' { L.TokBuiltinEvalExpr }
int { L.TokConstInt $$ }
str { L.TokConstStr $$ }
%left '+' '-'
%%
expr :: { Expr }
: op_expr { $1 }
| int { ConstInt $1 }
| eval_expr { $1 }
inject_string :: { () }
: str {% injectString $1 }
eval_expr :: { Expr }
: 'eval' '(' inject_string ')' expr { $5 }
op_expr :: { Expr }
: expr '+' expr { Plus $1 $3 }
| expr '-' expr { Minus $1 $3 }
{
data Expr =
ConstInt Int
| Plus Expr Expr
| Minus Expr Expr
deriving (Show)
injectString :: String -> L.Alex ()
injectString s = do
(a, b, c, t) <- L.alexGetInput
L.alexSetInput (a, b, c, s ++ t)
parseError :: L.Token -> L.Alex a
parseError tok = do
(L.AlexPn _ line column, _, _, _) <- L.alexGetInput
L.alexError $ "Parse error. Unexpected token " ++ show tok ++ " at line " ++ show line <> ", column " <> show column
lexer :: (L.Token -> L.Alex a) -> L.Alex a
lexer = (=<< L.alexMonadScan)
}
My approach was:
- use an empty monadic production to add the string as a prefix of the current Alex input buffer.
- then parse the (now modified) input with an
exprproduction
It works, but in my opinion it should fail.
I repeat here the productions involved:
inject_string :: { () }
: str {% injectString $1 }
eval_expr :: { Expr }
: 'eval' '(' inject_string ')' expr { $5 }
With an initial input
1 + eval("3-2")
Before the inject_string production has been processed, Alex's input is:
"3-2")
After the inject_string production has been processed, Alex's input is:
3-2)
That is, an expression followed by a closed parenthesis.
But rule eval_expr expects the opposite: ')' expr.
Why does that work?
If I turn the rule into what seems more logical to me:
eval_expr :: { Expr }
: 'eval' '(' inject_string expr ')' { $4 }
then I get:
1 + eval("3-2")
Parse error. Unexpected token TokRParen at line 1, column 16
My second question is whether there is a better approach to implement eval().
EDIT
Perhaps I understood why it works.
I said that after the inject_string production the input is:
3-2)
Because 3-2 is what I insert and ) is what is still to be read.
But ) is already consumed as a look-ahead token, isn't it?
So the input after the inject_string production is just
3-2
I'm still confused as why the next ) token in the production doesn't fail then. Is the look-ahead token not just consumed but already matched with the ) in the rule, even before inject_string is processed?
My understanding was that it is only matched in a "non-persistent" way to be able to solve possible rule ambiguities, but that it would still need to be matched after the inject_string production is done. Is this what I got wrong?
The second question still stands. Can I do this better?
With respect to your first question, in monadic Happy parsers, the monadic actions are executed at the time of reduction. If you dump the debugging info for your Happy parser with:
and inspect the resulting
Parser.infofile, you'll find the following state:When the parser is in state 12, having just read the string argument for an
eval, it will reduce theinject_stringnon-terminal only AFTER reading an expected)token. When this reduction happensAlexwill have just finished reading the)character, and your string will be injected at that point.With respect to your second question, you might want to consider the following issues. First, your parser happily accepts weird things like:
which seems undesirable. Second, if you think about the purpose of having an
evalfunction, it's to parse and evaluate a string at runtime, not parse time. In your current grammar, there's no way to generate strings that aren't just literals, but if you added syntax to convert numbers to strings withstrand concatenate strings with, say, a.operator, then you'd ideally like something like the following to work:and it doesn't seem reasonable to do string injection in the middle of the parser to accomplish this.
I'd suggest an alternative design where
eval("3-2")is parsed into anEvalconstructor with the string intact:(Future grammars could use
StringExprin place ofString, to support more complicated string expressions.)Your evaluation routine, at runtime, should then re-invoke the Happy parser to parse the string into an expression for evaluation.
If you really want to do this at parse time, I think you should take much the same approach. Don't fiddle with the parser's internal state. Instead, have your
evalrule re-invoke the parser on the string: