C++ boost::spirit lexer regex

1k views Asked by At

I'm doing a simple lexer/parser with boost::spirit.

This is the lexer :

template <typename Lexer>
struct word_count_tokens : lex::lexer<Lexer>
{
  word_count_tokens()
  {                                                                                                                                                                                                     
        this->self.add_pattern
          ("WORD", "[a-z]+")
          ("NAME_CONTENT", "[a-z]+")
          ;

        word = "{WORD}";
        name = ".name";
        name_content = "{NAME_CONTENT}";

        this->self.add
          (word)                                                                                                                                                           
          (name)                                                                                                                                                               
          (name_content)                                                                                                                                                     
          ('\n')                                                                                                                                                       
          (' ')
          ('"')
          (".", IDANY)                                                                                                                                   
          ;
  }                                                                                                                                       
  lex::token_def<std::string> word;
  lex::token_def<std::string> name;
  lex::token_def<std::string> name_content;
};

I defined two identical patterns : WORD and NAME_CONTENT.

This is the grammar :

template <typename Iterator>
struct word_count_grammar : qi::grammar<Iterator>
{
  template <typename TokenDef>
  word_count_grammar(TokenDef const& tok)
    : word_count_grammar::base_type(start)
{
using boost::phoenix::ref;
using boost::phoenix::size;

start = tok.name >> lit(' ') >> lit('"')  >> tok.word >> lit('"');
}

qi::rule<Iterator> start;
};

This code works with tok.word in the grammar, but if I replace tok.word by tok.name_content it does not works. But tok.word == tok.name_content.

What is the issue with this code ?

PS : what I want to parse is something like : .name "this is my name"

1

There are 1 answers

6
sehe On

Update Oh by the way the problem is you can only have one token match - they're matched in order. You /can/ work around this by using lexer states. But I don't recommend this any more than using lexer here in the first place


My suggestion would be to use Qi directly:

    qi::lexeme[".name"] >> qi::lexeme['"' >> *~qi::char_('"') >> '"']

My recollection of Lexer token patterns is one of exceedingly confusing escape requirements.

I might try to figure it out later - out of curiosity only

Live On Coliru

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    std::string const input(".name \"this is my name\"");

    auto f(input.begin()), l(input.end());


    std::string parsed_name;
    if (qi::phrase_parse(f,l,
                qi::lexeme[".name"] >> qi::lexeme['"' >> *~qi::char_('"') >> '"'],
                qi::space,
                parsed_name))
    {
        std::cout << "Parsed: '" << parsed_name << "'\n";
    }
    else
    {
        std::cout << "Parsed failed\n";
    }

    if (f!=l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}

Prints

Parsed: 'this is my name'