Why std::future is different returned from std::packaged_task and std::async?

2k views Asked by At

I got to know the reason that future returned from std::async has some special shared state through which wait on returned future happened in the destructor of future. But when we use std::pakaged_task, its future does not exhibit the same behavior. To complete a packaged task, you have to explicitly call get() on future object from packaged_task.

Now my questions are:

  1. What could be the internal implementation of future (thinking std::async vs std::packaged_task)?
  2. Why the same behavior was not applied to future returned from std::packaged_task? Or, in other words, how is the same behavior stopped for std::packaged_task future?

To see the context, please see the code below:

It does not wait to finish countdown task. However, if I un-comment // int value = ret.get();, it would finish countdown and is obvious because we are literally blocking on returned future.

    // packaged_task example
#include <iostream>     // std::cout
#include <future>       // std::packaged_task, std::future
#include <chrono>       // std::chrono::seconds
#include <thread>       // std::thread, std::this_thread::sleep_for

// count down taking a second for each value:
int countdown (int from, int to) {
  for (int i=from; i!=to; --i) {
    std::cout << i << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
  }
  std::cout << "Lift off!" <<std::endl;
  return from-to;
}

int main ()
{
   std::cout << "Start " << std::endl;
  std::packaged_task<int(int,int)> tsk (countdown);   // set up packaged_task
  std::future<int> ret = tsk.get_future();            // get future

  std::thread th (std::move(tsk),10,0);   // spawn thread to count down from 10 to 0

//   int value = ret.get();                  // wait for the task to finish and get result

  std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";

  th.detach();   

  return 0;
}

If I use std::async to execute task countdown on another thread, no matter if I use get() on returned future object or not, it will always finish the task.

// packaged_task example
#include <iostream>     // std::cout
#include <future>       // std::packaged_task, std::future
#include <chrono>       // std::chrono::seconds
#include <thread>       // std::thread, std::this_thread::sleep_for

    // count down taking a second for each value:
    int countdown (int from, int to) {
      for (int i=from; i!=to; --i) {
        std::cout << i << std::endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
      }
      std::cout << "Lift off!" <<std::endl;
      return from-to;
    }
    
    int main ()
    {
       std::cout << "Start " << std::endl;
      std::packaged_task<int(int,int)> tsk (countdown);   // set up packaged_task
      std::future<int> ret = tsk.get_future();            // get future
    
      auto fut = std::async(std::move(tsk), 10, 0);   

    
    //   int value = fut.get();                  // wait for the task to finish and get result
    
      std::cout << "The countdown lasted for " << std::endl;//<< value << " seconds.\n";

      return 0;
    }
3

There are 3 answers

0
Nicol Bolas On BEST ANSWER

std::async has definite knowledge of how and where the task it is given is executed. That is its job: to execute the task. To do that, it has to actually put it somewhere. That somewhere could be a thread pool, a newly created thread, or in a place to be executed by whomever destroys the future.

Because async knows how the function will be executed, it has 100% of the information it needs to build a mechanism that can communicate when that potentially asynchronous execution has concluded, as well as to ensure that if you destroy the future, then whatever mechanism that's going to execute that function will eventually get around to actually executing it. After all, it knows what that mechanism is.

But packaged_task doesn't. All packaged_task does is store a callable object which can be called with the given arguments, create a promise with the type of the function's return value, and provide a means to both get a future and to execute the function that generates the value.

When and where the task actually gets executed is none of packaged_task's business. Without that knowledge, the synchronization needed to make future's destructor synchronize with the task simply can't be built.

Let's say you want to execute the task on a freshly-created thread. OK, so to synchronize its execution with the future's destruction, you'd need a mutex which the destructor will block on until the task thread finishes.

But what if you want to execute the task in the same thread as the caller of the future's destructor? Well, then you can't use a mutex to synchronize that since it all on the same thread. Instead, you need to make the destructor invoke the task. That's a completely different mechanism, and it is contingent on how you plan to execute.

Because packaged_task doesn't know how you intend to execute it, it cannot do any of that.

Note that this is not unique to packaged_task. All futures created from a user-created promise object will not have the special property of async's futures.

So the question really ought to be why async works this way, not why everyone else doesn't.

If you want to know that, it's because of two competing needs: async needed to be a high-level, brain-dead simple way to get asynchronous execution (for which sychronization-on-destruction makes sense), and nobody wanted to create a new future type that was identical to the existing one save for the behavior of its destructor. So they decided to overload how future works, complicating its implementation and usage.

0
Mansoor On

The change in behaviour is due to the difference between std::thread and std::async.

In the first example, you have created a daemon thread by detaching. Where you print std::cout << "The countdown lasted for " << std::endl; in your main thread, may occur before, during or after the print statements inside the countdown thread function. Because the main thread does not await the spawned thread, you will likely not even see all of the print outs.

In the second example, you launch the thread function with the std::launch::deferred policy. The behaviour for std::async is:

If the async policy is chosen, the associated thread completion synchronizes-with the successful return from the first function that is waiting on the shared state, or with the return of the last function that releases the shared state, whichever comes first.

In this example, you have two futures for the same shared state. Before their dtors are called when exiting main, the async task must complete. Even if you had not explicitly defined any futures, the temporary future that gets created and destroyed (returned from the call to std::async) will mean that the task completes before the main thread exits.


Here is a great blog post by Scott Meyers, clarifying the behaviour of std::future & std::async.

Related SO post.

3
Sarfaraz Nawaz On

@Nicol Bolas has already answered this question quite satisfactorily. So I'll attempt to answer the question slightly from different perspective, elaborating the points already mentioned by @Nicol Bolas.

The design of related things and their goals

Consider this simple function which we want to execute, in various ways:

int add(int a, int b) {
    std::cout << "adding: " << a << ", "<< b << std::endl;
    return a + b;
}

Forget std::packaged_task, std ::future and std::async for a while, let's take one step back and revisit how std::function works — in particular, what problem it solves, and what problem it does not solve.

case 1 — std::function isn't good enough for executing things in different threads

std::function<int(int,int)> f { add };

Once we have f, we can execute it, in the same thread, like:

int result = f(1, 2); //note we can get the result here

Or, in a different thread, like this:

std::thread t { std::move(f), 3, 4 };
t.join(); 

If we see carefully, we realize that executing f in a different thread creates a new problem: how do we get the result of the function? Executing f in the same thread does not have that problem — we get the result as returned value, but when executed it in a different thread, we don't have any way to get the result. That is exactly what is solved by std::packaged_task.

case 2 — std::packaged_task solves the problem which std::function does not solve

In particular, it creates a channel between threads to send the result to the other thread. Apart from that, it is more or less same as std::function.

std::packaged_task<int(int,int)> f { add }; // almost same as before

std::future<int> channel = f.get_future();  // get the channel
    
std::thread t{ std::move(f), 30, 40 }; // same as before
t.join();  // same as before
    
int result = channel.get(); // problem solved: get the result from the channel

Now you see how std::packaged_task solves the problem which is left unsolved by std::function. That however does not mean that std::packaged_task has to be executed in a different thread. You can execute it in the same thread as well, just like std::function, though you will still get the result from the channel.

std::packaged_task<int(int,int)> f { add }; // same as before
std::future<int> channel = f.get_future(); // same as before
    
f(10, 20); // execute it in the current thread !!

int result = channel.get(); // same as before

So fundamentally std::function and std::packaged_task are similar kind of thing: they simply wrap callable entity, with one difference: std::packaged_task is multithreading-friendly, because it provides a channel through which it can pass the result to other threads. Both of them do NOT execute the wrapped callable entity by themselves. One needs to invoke them, either in the same thread, or in another thread, to execute the wrapped callable entity. So basically there are two kinds of thing in this space:

  • what is executed i.e regular functions, std::function, std::packaged_task, etc.
  • how/where is executed i.e threads, thread pools, executors, etc.

case 3: std::async is an entirely different thing

It's a different thing because it combines what-is-executed with how/where-is-executed.

std::future<int> fut = std::async(add, 100, 200);
int result = fut.get();

Note that in this case, the future created has an associated executor, which means that the future will complete at some point as there is someone executing things behind the scene. However, in case of the future created by std::packaged_task, there is not necessarily an executor and that future may never complete if the created task is never given to any executor.

Hope that helps you understand how things work behind the scene. See the online demo.

The difference between two kinds of std::future

Well, at this point, it becomes pretty much clear that there are two kinds of std::future which can be created:

  • One kind can be created by std::async. Such future has an associated executor and thus can complete.
  • Other kind can be created by std::packaged_task or things like that. Such future does not necessarily have an associated executor and thus may or may not complete.

Since, in the second case the future does not necessarily have an associated executor, its destructor is not designed for its completion/wait because it may never complete:

 {
   std::packaged_task<int(int,int)> f { add };
 
   std::future<int> fut = f.get_future(); 

 } // fut goes out of scope, but there is no point 
   // in waiting in its destructor, as it cannot complete 
   // because as `f` is not given to any executor.

Hope this answer helps you understand things from a different perspective.