How do I use a qi::symbols parser to match tokens from a spirit lexer using no_case?

501 views Asked by At

I have a lexer based on spirit::lexertl that produces tokens defined with lex::token_def<std::string>. I'd like to use a qi::symbols<> table to match tokens in that table, using the associated data from the symbol table as the attribute in the rule. Something like this [condensed from actual code]:

qi::symbols<char, int> mode_table;
mode_table.add("normal", 0)("lighten", 1)("darken", 2);

rule<Iterator, int()> mode = raw_token(tok.kMode) >> ':' >> ascii::no_case[mode_table];

When I compile that, however, I get the following error:

/Users/tim/Documents/src/tr_libs/boost/boost_1_49_0/boost/spirit/home/qi/string/detail/tst.hpp:80: error: conversion from 'char' to non-scalar type 'boost::spirit::lex::lexertl::token<boost::spirit::line_pos_iterator<boost::spirit::multi_pass<std::istreambuf_iterator<char, std::char_traits<char> >, boost::spirit::iterator_policies::default_policy<boost::spirit::iterator_policies::ref_counted, boost::spirit::iterator_policies::buf_id_check, boost::spirit::iterator_policies::buffering_input_iterator, boost::spirit::iterator_policies::split_std_deque> > >, boost::mpl::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::spirit::basic_string<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, symbol_type>, double, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, long unsigned int>' requested

line 80 in tst.hpp is this:

                c = filter(*i);

It sure looks to me it's trying to convert my lexer token to a char, which I understand is the character type in the symbols<char, int> table. On a whim, I did try symbols<ident, int> — where ident is my token type — but that's clearly not the documented symbols<> API, and predictably didn't work.

(You may ask why I don't just have the lexer emit these identifiers as token IDs, like kMode in the sample above. I could possibly do that in this particular case, but I'm really curious about the general case of integrating a symbol table in a grammar with a lexer.)

Fundamentally, I think my question is this: is it possible to use qi::symbols<> in this way, to match a token from a Spirit lexer?

1

There are 1 answers

0
Jeff Trull On BEST ANSWER

It isn't possible to use a symbols instance directly as you have... but through the use of Phoenix semantic actions it can be done, at the cost of increased verbosity. If you had a token_def<std::string> representing the value you wished to look up in the symbol table, you could integrate it into a rule like this:

qi::rule<Iterator, locals<int const*>, int()> modename;
using namespace boost::phoenix;
// disambiguate symbols::find method (there are two!)
typedef const symtab_t::value_type * (symtab_t::*findfn_t)(std::string const&) const;
modename = tok.modeName[_a = bind(static_cast<findfn_t>(&symtab_t::find),
                                  cref(mode_table), _1),
                        _pass = _a,
                        if_(_a)[_val = *_a]];

which manually looks up the token's string value in the symbol table, fails if it's not present, and otherwise copies the integer value found to the rule's result attribute.

Handling case-insensitivity can also be done via semantic actions, either in the parser (by converting to lower case prior to performing the lookup) or by converting as the tokens are created in the lexer. The latter approach could be handled like this:

this->self +=
     modeName[ 
        let(_a = construct<std::string>(_start, _end)) [
            bind(&to_lower<std::string>, ref(_a),
                 // must supply even defaulted arguments
                 construct<std::locale>()),
            _val = _a
            ]
         ];

This creates a copy of the underlying range and calls to_lower on it, supplying the result as the token value.

A full example can be found here