output of one file input to next

462 views Asked by At

How would the output of one script be passed as the input to another? For example if a.py outputs format.xml then how would a.py call b.py and pass it the argument format.xml? I think it's supposed to work like piping done on the command line.

I've been hired by a bunch of scientists with domain specific knowledge but sometimes there computer programming requirements don't make sense. There's a long chain of "modules" and my boss is really adamant about 1 module being 1 python script, and the output of one module is the input of the next. I'm very new to Python, but if this design pattern rings a bell to anyone let me know.

Worse yet the project is to be converted to executable format (using py2exe) and there still has to be the same number of executable files as .py files.

2

There are 2 answers

0
Henrik On

The pattern makes sense in some cases, but for me it's when you want to be able to run each module as a self sustained executeable.

I.E. Should you want to use the script from within FORTRAN or similar language, it is the easiest way, to build the python module to an executeable, and then call it from FORTRAN.

That would not mean that one module is pr definition 1 python file, just that it only has one entry point, and is in fact executeable.

The one module pr script, could be to make it easier to locate the code. Or to mail it to someone for code inspection or peer review (done often in scientific communities)

So the requirements may be a mix of technical and social requirements.

Anyway back to the problem.

I would use the subprocess module to call the next module. (with close_fds set to true)

If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (Unix only). Or, on Windows, if close_fds is true then no handles will be inherited by the child process. Note that on Windows, you cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.

0
Denis Patrikeev On

To emulate a | b shell pipeline in Python:

#!/usr/bin/env python
from subprocess import check_call

check_call('a | b', shell=True)

a program writes to its stdout stream and knows nothing about b program. b program reads from its stdin and knows nothing about a program.

A more flexible approach is to define functions, classes in a.py, b.py modules that work with objects and to implement the command-line interface that produces/consumes xml in terms of these functions in if __name__ == "__main__" block e.g., a.py:

#!/usr/bin/env python
import sys
import xml.etree.ElementTree as etree

def items():
    yield {'name': 'a'}
    yield {'name': 'b'}

def main():
   parent = etree.Element("items")
   for item in items():
       etree.SubElement(parent, 'item', attrib=item)
   etree.ElementTree(parent).write(sys.stdout) # set encoding="unicode" on Python 3

if __name__=="__main__":
    main()

It would allow to avoid unnecessary serialization to xml/deserialization from xml when the scripts are not called from the command line:

#!/usr/bin/env python
import a, b

for item in a.items():
    b.consume(item)

Note: item can be an arbitrary Python object such as dict or an instance of a custom class.