Is it possible to make zero-allocation coroutine runtime in C++?

1.6k views Asked by At

In Rust, asynchronous functions do not require allocating on the heap. Function async returns a compiler-generated structure on which you call a compiler generated poll method. The design of async looks logical and clear.

The design of coroutines in C++ is strange. It forces you to do allocation on the heap, because you have no other way to return something using co_return.

(It is possible to make a custom allocator that will make allocations in a buffer on the stack, but this will unnecessarily complicate the code.)

Why was it decided in the design of C++ that an object returned by a coroutine must have a promise_type?

Why is await_ready, await_suspend, await_resume not enough?

(It looks strange, and this is what enforces you to do allocation; you can't just directly construct a SomeTask<T> object (with three await_* methods) and return it.)

How can we make a zero-alloc coroutines without a custom allocator?

2

There are 2 answers

11
Nicol Bolas On BEST ANSWER

Essentially it forces you to do allocation on the heap, because you have no other way to return something using co_return

That's not the reason for the allocation.

Coroutines require storage to do their job. Not just the marshaling of return values, but for tracking the entire thing. The coroutine's stack, the thing that has to be paused and resumed, is part of this.

The whole point of a coroutine is that it can suspend execution and have that execution resumed by somebody. Who that somebody is is not known statically. And that is the reason why dynamic allocation is needed; coroutine lifetimes are not limited by the scope of their initial invoker.

If function X invokes coroutine A, and coroutine A suspends itself pending some async request, function X may just return the coroutine to someone else who will wait on it. This means that function X's call stack is gone. If coroutine A lived on function X's stack, its storage is gone now.

That's bad.

C++ stores coroutine A in dynamically-allocated memory to allow it to survive its calling invocation. This is the presumed default case of coroutines.

Presumably, the circumstances you are describing are those where function X itself is a coroutine that awaits on A, or otherwise will not leave its own scope until A is finished. This would allow the coroutine state of A to live on X's stack without problems. C++ compilers are permitted to optimize these conditions if they are detected; they can elide the dynamic allocation.

But otherwise, that dynamic allocation is necessary.

I don't know Rust. So I don't know what Rust is doing with its async functions. Maybe its async functions are unable to persist beyond the boundaries of their calling scope. Maybe Rust is better able to detect when the async function needs to be able to live beyond its caller's scope. Maybe you have to explicitly tell Rust when you want a coroutine to escape its caller's scope. I don't know.

But C++ isn't like that. C++ coroutines are presumed to leave the scope of their invoking function.

0
VadimP22 On

I have found a solution to my problem. At the moment, it's certainly not ready to use, but I've chosen the right direction.

I wrote this code:

template <typename Fn, typename ArgT = int, typename ... Next> struct Async {
    ArgT arg;
    bool is_executed = false;
    Async<Next...> next;

    Async(Fn fn, ArgT arg) : arg(arg) { }
    Async(Fn fn) {}
    Async() {}
   
    void Execute() {
        Fn fn;
        next.arg = fn(arg); 
    }

    bool Poll() {
        if (is_executed) {
            return next.Poll();
        } else {
            Execute();
            is_executed = true;
        }

        return false;
    }

    template<typename NextFn> auto Then(NextFn fn) -> Async<Fn, ArgT, Next..., NextFn, int> {
        Fn f;
        return Async<Fn, ArgT, Next..., NextFn, int>(f, arg);
    }

};

template <typename Fn, typename ArgT> struct Async<Fn, ArgT> {
    ArgT arg;
    bool is_executed = false;

    Async(Fn fn, ArgT arg) : arg(arg) { }
    Async(Fn fn) {}
    Async() {}

    template<typename NextFn> auto Then(NextFn fn) -> Async<Fn, ArgT, NextFn, int> {
        Fn f;
        return Async<Fn, ArgT, NextFn, int>(f, arg);
    }

    void Execute() {
        Fn fn;
        fn(arg); 
    }

    bool Poll() {
        if(!is_executed) {
            Execute();
            is_executed = true;
        }

        return true;
    }

};

And now I can write like this:

int main() {
    auto a = Async([](auto f){
        std::cout << "[lambda 1] takes " << f << std::endl;
        return 1234;
    }, 4).Then([](auto f){
        std::cout << "[lambda 2] takes " << f << std::endl;
        return 4321;
    }).Then([](auto f){       
        std::cout << "[lambda 3] takes " << f << std::endl;
    });

    while (!a.Poll()) {}
}

This approach has a disadvantage: you cannot use co_* operators, but the API is quite convenient.

As I already said, the code is not ready yet and I have a lot of work to do. When I'm done I'll update code in this answer.