How to read binary data over a pipe from another process in python?

3.2k views Asked by At

I launch another process using the following command:

p = subprocess.Popen(binFilePath, shell=False, stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=False)

According to documentation p.stdout should now have a binary data stream as universal_newlines is set to False.

If the other program now sends binary data how can I read it? A call to the following command does not return though there is a limited amount of data waiting to be read:

returnedData = p.stdout.read()

I want exactly the amount of data waiting in the pipe (if there is data available and otherwise block until data is available). So how do I do that?

1

There are 1 answers

0
Regis May On BEST ANSWER

It's not simple as python is not designed for this kind of stuff as other programming languages would be.

First: You have to switch the pipe to non-blocking. Otherwise a call to read() will block in all cases. This is done with this code:

fd = p.stdout.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

Note: Be aware the p.stdout is not your output stream to the other process but the process' output stream, which is your input stream.

Now as we have a non-blocking stream we can proceed.

Second: Wait until data is available. This can be done with select():

streams = [ p.stdout ]
temp0 = []
readable, writable, exceptional = select.select(streams, temp0, temp0, 5)
if len(readable) == 0:
    raise Exception("Timeout of 5 seconds reached!")

As far as I know exceptional will never receive any data as we deal with pipes here.

Third: Now let's read the data:

temp = bytearray(4096)
numberOfBytesReceived = p.stdout.readinto(temp)
if numberOfBytesReceived <= 0:
    raise Exception("No data received!")

Additional information:

Of course you have no idea how much data the sender actually sent. You have to repeatedly read data and check if you have all output of the sending process. This you can either be sure about if the process closes the stream - but that would render this question completely obsolete as then there would be no need for this kind of I/O implementation at all - or until some specific all-data-sent-mark has been sent.

Additional note:

If you are required to perform multiple reads on the pipe in order to fully read a meaningful chunk of data sent by another process you will end up doing this in a loop and append data to a buffer. This requires copying data from the temporary buffer your p.stout.readinto(temp) has been writing to to your real buffer where you want to keep the data. As far as I know there is no more efficient way in python as readinto() always (!) writes to the beginning of a pre-allocated buffer. It is not possible to write data at a specific offset as it is well known within other programming languages. If there really is no other way as it seems to me here, this must be considered to be a design flaw in the python API.