Parse key, value pairs when key is not unique

272 views Asked by At

My input are multiple key, value pairs e.g.:

A=1, B=2, C=3, ..., A=4

I want to parse the input into the following type:

 std::map< char, std::vector< int > > m

Values for equal keys shall be appended to the vector. So the parsed output should be equal to:

m['A']={1,4};
m['B']={2};
m['C']={3};

What is the simplest solution using 'boost::spirit::qi' ?

1

There are 1 answers

0
Andrey Semashev On

Here is one way to do it:

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/at_c.hpp>
#include <iostream>
#include <utility>
#include <string>
#include <vector>
#include <map>

namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;

int main()
{
    std::string str = "A=1, B=2, C=3, A=4";

    std::map< char, std::vector< int > > m;
    auto inserter = [&m](fusion::vector< char, int > const& parsed,
        qi::unused_type, qi::unused_type)
    {
        m[fusion::at_c< 0 >(parsed)].push_back(fusion::at_c< 1 >(parsed));
    };

    auto it = str.begin(), end = str.end();
    bool res = qi::phrase_parse(it, end,
        ((qi::char_ >> '=' >> qi::int_)[inserter]) % ',',
        qi::space);

    if (res && it == end)
        std::cout << "Parsing complete" << std::endl;
    else
        std::cout << "Parsing incomplete" << std::endl;

    for (auto const& elem : m)
    {
        std::cout << "m['" << elem.first << "'] = {";
        for (auto value : elem.second)
            std::cout << " " << value;
        std::cout << " }" << std::endl;
    }

    return 0;
}

A few comments about the implementation:

  1. qi::phrase_parse is a Boost.Spirit algorithm that takes a pair of iterators, a parser, and a skip parser, and runs the parsers on the input denoted by the iterators. In the process, it updates the beginning iterator (it in this example) so that it points to the end of the consumed input upon return. The returned res value indicates whether the parsers have succeeded (i.e. the consumed input could be successfully parsed). There are other forms of qi::phrase_parse that allow extracting attributes (which is the parsed data, in terms of Boost.Spirit) but we're not using attributes here because you have a peculiar requirement of the resulting container structure.

  2. The skip parser is used to skip portions of the input between the elements of the main parser. In this case, qi::space means that any whitespace characters will be ignored in the input, so that e.g. "A = 1" and "A=1" can both be parsed similarly. There is qi::parse family of algorithms which do not have a skip parser and therefore require the main parser to handle all input without skips.

  3. The (qi::char_ >> '=' >> qi::int_) part of the main parser matches a single character, followed by the equals sign character, followed by a signed integer. The equals sign is expressed as a literal (i.e. it is equivalent to the qi::lit('=') parser), which means it only matches the input but does not result in a parsed data. Therefore the result of this parser is an attribute that is a sequence of two elements - a character and an integer.

  4. The % ',' part of the parser is a list parser, which parses any number of pieces of input described by the parser on the left (which is the parser described above), separated by the pieces described by the parser on the right (i.e. with comma characters in our case). As before, the comma character is a literal parser, so it doesn't produce output.

  5. The [inserter] part is a semantic action, which is a function that is called by the parser every time it matches a portion of input string. The parser passes all its parsed output as the first argument to this function. In our case the semantic action is attached to the parser described in bullet #3, which means a sequence of a character and an integer is passed. Boost.Spirit uses a fusion::vector to pass these data. The other two arguments of the semantic action are not used in this example and can be ignored.

  6. The inserter function in this example is a lambda function, but it could be any other kind of function object, including a regular function, a function generated by std::bind, etc. The important part is that it has the specified signature and that the type of its first argument is compatible with the attribute of the parser, to which it is attached as a semantic action. So, if we had a different parser in bullet #3, this argument would have to be changed accordingly.

  7. fusion::at_c< N >() in the inserter obtains the element of the vector at index N. It is very similar to std::get< N >() when applied to std::tuple.