I am in the initial design phase of implementing a jinja2-like template language for Elixir. I had been inclined to writer the lexer by hand, but I have recently come across the leex module for Erlang. It looks promising, but after some initial research I am unsure if it is the proper tool for my purposes.
One of my hesitations is a template language being, essentially, a string embedded language, it is not clear how to use tokenize for this case using leex. As a trivial example, imagine tokenizing this template:
<p>Here is some text for inclusion in the template.</p>
{% for x in some_variable %}
The value for the variable: {{ x }}.
{% endfor %}
In this example, I need to ensure that the kewords 'for' and 'in' are tokenized differently depending on:
- If they are inside a tag: {% %}
- If they are inside a tag: {{ }}
- If they are in the template, but not inside any tags.
To me this looks like I would need to either do two passes in the tokenizing phase, or roll my own lexer in order to do this in one pass.
I am wondering if anyone who has experience with lexical analysis, and particularly leex, or writing template engines can provide some insight into the best way forward?
Let me apologize in advance if this isn't helpful, but I think of lexical analysis as having the power of regular expression and, as such, I suspect that what you are trying to do is not in the sweet-spot of RE's or Leex. First pass would be to go from source-code to lexical elements (tokens) which would be mostly be devoid of context and would be an appropriate use of Leex.
I think the handling of the different, context-sensitive semantics of your FOR and IN tokens would be handled via parsing and Erlang's Yecc. You may be able to handle comments in the lexical analysis phase, but I think in general you might use a combination of Leex and Yecc.