Case Insensitive String Comparison of Boost::Spirit Token Text in Semantic Action

336 views Asked by At

I've got a tokeniser and a parser. the parser has a special token type, KEYWORD, for keywords (there are ~50). In my parser I want to ensure that the tokens are what I'd expect, so I've got rules for each. Like so:

KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];

This works well enough, but it's not case insensitive (and the grammar I'm trying to handle is!). I'd like to use boost::iequals, but attempts to convert _1 to an std::string result in the following error:

error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')

How can I treat these keywords as strings and ensure they're the expected text irrespective of case?

1

There are 1 answers

4
Liam M On

A little learning went a long way. I added the following to my lexer:

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef void type;
    };

    template <typename Value>
    void operator()(Value const& val) const
    {
        // This modifies the original input string.
        typedef boost::iterator_range<std::string::iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);
        std::for_each(ip.begin(), ip.end(),
            [](char& in)
            {
                in = std::toupper(in);
            });
    }
};

    boost::phoenix::function<normalise_keyword_impl> normalise_keyword;

    // The rest...
};

And then used phoenix to bind the action to the keyword token in my constructor, like so:

this->self =
    KEYWORD [normalise_keyword(_val)]
    // The rest...
    ;

Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?

I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')

Very peculiar, it appears I have some more learning to do.

Final Solution

Okay, I ended up using this function:

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef std::string type;
    };

    template <typename Value>
    std::string operator()(Value const& val) const
    {
        // Copy the token and update the attribute value.
        typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);

        auto result = std::string(ip.begin(), ip.end());
        result = boost::to_upper_copy(result);
        return result;
    }
};

And this semantic action:

KEYWORD [_val = normalise_keyword(_val)]

With (and this sorted things out), a modified token_type:

typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;

// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);

// ...

The important addition being boost::mpl::vector<std::string> >. The result:

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')

I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.