boost x3 grammar for structs with multiple constructors

88 views Asked by At

Trying to figure out how to parse structs that have multiple constructors or overloaded constructors. For example in this case, a range struct that contains either a range or a singleton case where the start/end of the range is equal.

case 1: look like

"start-stop"

case 2:

"start"

For the range case

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);

works but

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

unsurprisingly, won't match the signature and fails to compile.

Not sure what the fix is?

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange
{
    size_t start;
    size_t end;
    // little bit weird because should be end+1, but w/e
    explicit MyRange(size_t start, size_t end = 0) : start(start), end(end == 0 ? start : end)
    {
    }
};
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
// BOOST_FUSION_ADAPT_STRUCT(MyRange, start)
//

int main()
{
 
    auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
    // auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

    for (std::string input :
         {"1-2", "1","1-" ,"garbage"})
    {
                auto success = x3::phrase_parse(input.begin(), input.end(),
                                        // Begin grammar
                                        range_constraint,
                                        // End grammar
                                        x3::ascii::space);
        std::cout << "`" << input << "`"
                  << "-> " << success<<std::endl;
    }
    return 0;
}
1

There are 1 answers

1
sehe On BEST ANSWER

It's important to realize that sequence adaptation by definition uses default construction with subsequent sequence element assignment.

Another issue is branch ordering in PEG grammars. int_ will always success where int_ >> '‑' >> int_ would so you would never match the range version.

Finally, to parse size_t usually prefer uint_/uint_parser<size_t> :)

Things That Don't Work

There are several ways to skin this cat. For one, there's BOOST_FUSION_ADAPT_STRUCT_NAMED, which would allow you to do

BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, Range, start, end)
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, SingletonRange, start)

So one pretty elaborate would seem to spell it out:

auto range     = x3::rule<struct _, Range>{}          = uint_ >> '-' >> uint_;
auto singleton = x3::rule<struct _, SingletonRange>{} = uint_;
auto rule      = x3::rule<struct _, MyRange>{}        = range | singleton;

TIL that this doesn't even compile, apparently Qi was differently: Live On Coliru

X3 requires the attribute to be default-constructible whereas Qi would attempt to bind to the passed-in attribute reference first.

Even in the Qi version you can see that the fact Fusion sequences will be default-contructed-then-memberwise-assigned leads to results you didn't expect or want:

`1-2` -> true
 -- [1,NIL)
`1` -> true
 -- [1,NIL)
`1-` -> true
 -- [1,NIL)
`garbage` -> false

What Works

Instead of doing the complicated things, do the simple thing. Anytime you see an optional value you can usually provide a default value. Alternatively you can not use Sequence adaptation at all, and go straight to semantic actions.

Semantic Actions

The simplest way would be to have specific branches:

auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx))); };

auto rule = x3::rule<void, MyRange>{} =
    (uint_ >> '-' >> uint_)[assign2] | uint_[assign1];

Slighty more advanced, but more efficient:

auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(_val(ctx).start, _attr(ctx)); };

auto rule = x3::rule<void, MyRange>{} = uint_[assign1] >> -('-' >> uint_[assign2]);

Lastly, we can move towards defaulting the optional end:

auto rule = x3::rule<void, MyRange>{} =
    (uint_ >> ('-' >> uint_ | x3::attr(MyRange::unspecified))) //
        [assign];

Now the semantic action will have to deal with the variant end type:

auto assign = [](auto& ctx) {
    auto start = at_c<0>(_attr(ctx));
    _val(ctx)  = apply_visitor(                         //
        [=](auto end) { return MyRange(start, end); }, //
        at_c<1>(_attr(ctx)));
};

Also Live On Coliru

Simplify?

I'd consider modeling the range explicitly as having an optional end:

struct MyRange {
    MyRange() = default;
    MyRange(size_t s, boost::optional<size_t> e = {}) : start(s), end(e) {
        assert(!e || *e >= s);
    }

    size_t size() const  { return end? *end - start : 1; }
    bool   empty() const { return size() == 0; }

    size_t                  start = 0;
    boost::optional<size_t> end   = 0;
};

Now you can directly use the optional to construct:

auto assign = [](auto& ctx) {
    _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx)));
};

auto rule = x3::rule<void, MyRange>{} = (uint_ >> -('-' >> uint_))[assign];

Actually, here we can go back to using adapted sequences, although with different semantics:

Live On Coliru

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;

struct MyRange {
    size_t                  start = 0;
    boost::optional<size_t> end   = 0;
};

static inline std::ostream& operator<<(std::ostream& os, MyRange const& mr) {
    if (mr.end)
        return os << "[" << mr.start << "," << *mr.end << ")";
    else
        return os << "[" << mr.start << ",)";
}

BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)

int main() {
    x3::uint_parser<size_t> uint_;
    auto rule = x3::rule<void, MyRange>{} = uint_ >> -('-' >> uint_);

    for (std::string const input : {"1-2", "1", "1-", "garbage"}) {
        MyRange into;
        auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
        std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
                  << std::endl;

        if (success) {
            std::cout << " -- " << into << "\n";
        }
    }
}

Summarizing

I hope these strategies give you all the things you needed. Pay close attention to the semantics of your range. Specifically, I never payed any attention to difference between "1" and "1-". You might want one to be [1,2) and the other to be [1,inf), both to be equivalent, or the second one might even be considered invalid?

Stepping back even further, I'd suggest that maybe you just needed

using Bound   = std::optional<size_t>;
using MyRange = std::pair<Bound, Bound>;

Which you could parse directly with:

auto boundary = -x3::uint_parser<size_t>{};
auto rule = x3::rule<void, MyRange>{} = boundary >> '-' >> boundary;

It would allow for more inputs:

for (std::string const input : {"-2", "1-2", "1", "1-", "garbage"}) {
    MyRange into;
    auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
    std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
              << std::endl;

    if (success) {
        std::cout << " -- " << into << "\n";
    }
}

Prints: Live On Coliru

`-2` -> true
 -- [,2)
`1-2` -> true
 -- [1,2)
`1` -> false
`1-` -> true
 -- [1,)
`garbage` -> false