Can Pweave play nice with Ruffus?

407 views Asked by At

I am interested in developing self-documenting pipelines.

Can I wrap Ruffus tasks in Pweave chunks?

Pweave and Ruffus
==============================================================

**Let's see if Pweave and ruffus can play nice**


<<load_imports>>=
import time
from ruffus import *
@

**Do this**
<<task1>>=
task1_param = [
                    [ None, 'job1.stage1'], # 1st job
                    [ None, 'job2.stage1'], # 2nd job
              ]
@files(task1_param)
def first_task(no_input_file, output_file):
    open(output_file, "w")
@

I get the feeling the Ruffus decorators are throwing Pweave off:

$ Pweave ruffus.Pnw
Processing chunk 1 named load_imports
Processing chunk 2 named task1
<type 'exceptions.TypeError'>
("unsupported operand type(s) for +: 'NoneType' and 'str'",)

Perhaps there is a workaround?

2

There are 2 answers

0
Leo Goodstadt On BEST ANSWER

I am the author of Ruffus and have just checked in changes to ruffus to allow it to cooperate with pweave into the google source code repository. I will be in the next release.

You can get the latest (fixed) source with the following command line if you are impatient:

hg clone https://[email protected]/p/ruffus/ 

Leo

The details are as follows:

Ruffus uses the full qualified name (with module name) of each ruffus task function to uniquely identify code so that pipeline tasks can be referred to by name.

The Pweave code was very straightforward. Nice! Pweave sends chunks of code at a time to the python interpretor to be exec-ed chunk by chunk. Of course chunks do not belong to any "module" and task functions have function.__module__ values of None rather than any string.

A single judicious str() converting None to "None" seems to have solved the problem.

Leo

0
Noah On

For the record, pweave works fine with decorators.

This has to do with how ruffus identifies which function is which -- the function actually has to belong to a module file, as the function.__module__ property is used. I'm not sure that you can trick it into including all the information needed to create these function identifiers.

You can see the errors for yourself if you edit the pweb.py script included with pweave such that the try:...except: statements in the pweave() function are more verbose (easiest is just comment out the try and except parts). The errors you get are in the deepest bits of ruffus.

I'd suggest staying away from a complex library like ruffus for didactic purposes, as ruffus in particular uses a number of hacks (syntactic sugar, if you will) to provide a simple user interface. If you're dead set on using it for this purpose, you could try contacting the author who has been quite responsive to my feature requests. He might have some ideas for how to do this.