parsing into std::vector<string> with Spirit Qi, getting segfaults or assert failures

965 views Asked by At

I am using Spirit Qi as my parser, to parse mathematical expressions into an expression tree. I keep track of such things as the types of the symbols which are encountered as I parse, and which must be declared in the text I am parsing. Namely, I am parsing Bertini input files, a simple-ish example of which is here, a complicated example is here, and for completeness purposes, as below:

%input: our first input file
  variable_group x,y;
  function f,g;

  f = x^2 - 1;
  g = y^2 - 4;
 END;

The grammar I have been working on will ideally

  • find declaration statements, and then parse the following comma-separated list of symbols of the type being declared, and store the resulting vector of symbols in the class object being parsed into; e.g. variable_group x, y;
  • find a previously declared symbol, which is followed by an equals sign, and is the definition of that symbol as an evaluatable mathematical object; e.g. f = x^2 - 1; This part I mostly have under control.
  • find a not-previously declared symbol followed by =, and parse it as a subfunction. I think I can handle this, too.

The problem I have been struggling to solve seems like it is so trivial, yet after hours of searching, I still haven't gotten there. I've read dozens of Boost Spirit mailing list posts, SO posts, the manual, and the headers for Spirit themselves, yet still don't quite grok a few critical things about Spirit Qi parsing.

Here is the problematic basic grammar definition, which would go in system_parser.hpp:

#define BOOST_SPIRIT_USE_PHOENIX_V3 1


#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>




namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;


template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{


    SystemParser() : SystemParser::base_type(variable_group_)
    {
        namespace phx = boost::phoenix;
        using qi::_1;
        using qi::_val;
        using qi::eps;
        using qi::lit;

        qi::symbols<char,int> encountered_variables;

        qi::symbols<char,int> declarative_symbols;
        declarative_symbols.add("variable_group",0);



        // wraps the vector between its appropriate declaration and line termination.
        BOOST_SPIRIT_DEBUG_NODE(variable_group_);
        debug(variable_group_);
        variable_group_.name("variable_group_");
        variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');


        // creates a vector of strings
        BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
        debug(genericvargp_);
        genericvargp_.name("genericvargp_");
        genericvargp_ %= new_variable_ % ',';




        // will in the future make a shared pointer to an object using the string
        BOOST_SPIRIT_DEBUG_NODE(new_variable_);
        debug(new_variable_);
        new_variable_.name("new_variable_");
        new_variable_ %= unencountered_symbol_;


        // this rule gets a string.
        BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
        debug(unencountered_symbol_);
        unencountered_symbol_.name("unencountered_symbol");
        unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);


        // get a string which fits the naming rules.
        BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
        valid_variable_name_.name("valid_variable_name_");
        valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );



    }


    // rule declarations.  these are member variables for the parser.
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
    qi::rule<Iterator, std::string(), ascii::space_type>  new_variable_;
    qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type


    // the rule which determines valid variable names
    qi::rule<Iterator, std::string()> valid_variable_name_;
};

and some code which uses it:

#include "system_parsing.hpp"



int main(int argc, char** argv)
{


    std::vector<std::string> V;
    std::string str = "variable_group x, y, z;";


    std::string::const_iterator iter = str.begin();
    std::string::const_iterator end = str.end();


    SystemParser<std::string::const_iterator> S;


    bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);

    std::cout << "the unparsed string:\n" << std::string(iter,end);


    return 0;
}

It compiles under Clang 4.9.x on OSX just fine. When I run it, I get:

Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.

Alternately, if I use expectation operator > rather than >> in the definition of the variable_group_ rule, I get our dear old friend Segmentation fault: 11.

In my learning process, I've come across such excellent posts as how to tell the type spirit is trying to generate, attribute propagation, how to interact with symbols, an example of infinite left recursion which lead to a segfault, information on parsing into classes, not structs which has a link to using Customization points (yet the links contain no examples), the Nabialek trick which couples keywords to actions, and perhaps most relevant for what I am trying to do dynamic difference parsing which is certainly something I need since the set of symbols grows, and I disallow usage of them as another type later, as the set of already-encountered symbols starts empty, and grows -- that it, the rules for parsing are dynamic.

So here's where I am at. My current problem is the assert/segfault generated by this particular example. However, I am unclear on some things, and need guiding advice, which I just haven't put together from any of the sources I have consulted, and the request for which hopefully makes this SO question disjoint from others previously asked:

  • When is it appropriate to use lexeme? I just don't know when to use lexeme, and not.
  • What are some guidelines for when to use > rather than >>?
  • I've seen many Fusion adapt examples where there is a struct to be parsed into, and a set of rules to do so. My input files will possibly have multiple occurrences of declarations of function, variables, etc, which all need to go the same place, so I need to be able to add to fields of the terminal class object into which I am parsing, in any order, multiple times. I think I would like to use getter/setters for the class object, so that parsing is not the only pathway to object construction. Is this a problem?

Any kind advice for this beginner is most welcome.

1

There are 1 answers

0
sehe On BEST ANSWER

You reference the symbols variables. But they are locals so they don't exist once the constructor returns. This invokes Undefined Behaviour. Anything can happen.

Make the symmbol tables members of the class.

Also simplifying the dance around

  • the skippers (see Boost spirit skipper issues). That link also answers your _"When is it appropriate to use lexeme[]. In your sample you lacked the lexeme[] around encountered_variables|declarative_symbols, for example.
  • the debug macros
  • the operator%=, and some generally unused stuff
  • guessing you didn't need the mapped type of the symbols<> (because the int wasn't consumed), simplified the initialization there

Demo

Live On Coliru

#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1

#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>

namespace qi    = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {

    SystemParser() : SystemParser::base_type(variable_group_) 
    {
        declarative_symbols += "variable_group";

        variable_group_        = "variable_group" >> genericvargp_ >> ';';
        genericvargp_          = new_variable_ % ',';
        valid_variable_name_   = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
        unencountered_symbol_  = valid_variable_name_ - (encountered_variables|declarative_symbols);
        new_variable_          = unencountered_symbol_;

        BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
    }
  private:

    qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;

    // rule declarations.  these are member variables for the parser.
    qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
    qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
    qi::rule<Iterator, std::string()> new_variable_;
    qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper

    // the rule which determines valid variable names
    qi::rule<Iterator, std::string()> valid_variable_name_;
};

//#include "system_parsing.hpp"

int main() {

    using It = std::string::const_iterator;
    std::string const str = "variable_group x, y, z;";

    SystemParser<It> S;

    It iter = str.begin(), end = str.end();
    std::vector<std::string> V;
    bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);

    if (s)
    {
        std::cout << "Parse succeeded: " << V.size() << "\n";
        for (auto& s : V)
            std::cout << " - '" << s << "'\n";
    }
    else
        std::cout << "Parse failed\n";

    if (iter!=end)
        std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}

Prints

Parse succeeded: 3
 - 'x'
 - 'y'
 - 'z'