I have several C-utilities. Each of them reads data from stdin, process data and write the result to stdout. Each utility stops when it's stdin closed. Simplified example of such utility is listed below.
child:
#include <stdio.h>
int main() {
char buf[256];
int nread;
do {
nread = fread(buf, 1, 256, stdin);
// Process data...
fwrite(buf, 1, 256, stdout);
} while (nread > 0);
return 0;
}
It allows us to make a chain of these utils connected through pipes. For example, with bash: u1 < data.bin | u2 | u3 | ... | un > result.bin. Once stdin of the first process reaches the end of the file and closes, all processes in the chain cascade close.
And now I need to execute a more complicated chain of these utils from C++. Actually, not only chains but complicated graphs.
But there is a problem with stopping the chain. When I execute one child and close it's stdin, the child's process stops (fread returns 0). But when I execute two or more children, connected with boost::process::pstream or boost::process::pipe (doesn't actually matter), nothing happens when I close stdin. The first child in the chain is still waiting for data from stdin. Simple examples are listed below.
Example 1: one child stops when I close it's stdin - OK
#include <boost/process.hpp>
namespace bp = boost::process;
int main(int argc, char *argv[]) {
bp::opstream cstdin;
bp::ipstream cstdout;
bp::child c0("child", bp::std_in<cstdin, bp::std_out> cstdout);
cstdin.pipe().close();
cstdin.close();
c0.wait();
return 0;
}
Example 2: two children don't stop when I close the first child's stdin - PROBLEM
#include <boost/process.hpp>
namespace bp = boost::process;
int main(int argc, char *argv[]) {
bp::opstream cstdin;
bp::ipstream cstdout;
bp::pstream connector;
bp::child c0("child", bp::std_in<cstdin, bp::std_out> connector);
bp::child c1("child", bp::std_in<connector, bp::std_out> cstdout);
cstdin.pipe().close();
cstdin.close();
c0.wait();
c1.wait();
return 0;
}
Running on Debian 10, gcc 8.3, boost 1.80.
How can I fix it?
I think you're running into synchronous IO deadlocks here. The streams will buffer, but when the buffer is at capacity they will block.
We can demonstrate this with a single child:
Now, testing with a simple program
Live On Coliru
May complete (as shown) for some small sample files, but on my system, adding a larger file like
/etc/dictionaries-common/wordsmakes the thing block:Now, a simple workaround here would be to make the IO pumps use separate threads:
And indeed now it works, but this doesn't really scale well.
Enter Async IO
To prevent the need for threads, let alone many of them, as well as the perils of synchronizing access to shared objects, I'd suggest using the async interface of Boost Process:
Live On Coliru
As you can see from the output using 3 random Coliru samples, the output is correct:
Now this looks tricky, but it's mainly because of the weird loop surrounding my choice of example/test.
Abstracting
You can package it all up to make it simpler:
Now you can write a much more involved example with many child processes:
Which still roundtrips as expected: Live On Coliru
Closing Thoughts¹
I'd package that up some more, e.g. into a
Chainclass, and perhaps usingboost::process::groupso you get better control of all processes in the chain.I have some pretty inspirational answers up on this site showing off the async interface of Boost Process some more in case you want to see more: https://stackoverflow.com/search?tab=newest&q=user%3a85371%20process%20async
Examples include downloading the first N million primes asynchronously from a website and uncompressing on the fly, having two UCS chess engines play a game of chess against each-other, automating invocation of
ffmpegusing the pipe interface etc.¹ no pun intended