In Bazel is there a concept for creating genrules based on an input file?

527 views Asked by At

Imagine a Bazel based build workflow which lets a genrule() take the content of a file (say sources-list.txt) and creates files for each line of sources-list.txt. Now you also know that each line and it's output are independent from all other, i.e. every line might also define an independent build rule. So if only a few lines change in a large sources-list.txt you'd be better off creating the output for only those lines which have changed to avoid unnecessary build cost.

With Bazel and genrule() alone an approach to achieve this would be to preprocess sources-list.txt (before bazel build) to create BUILD files for each line and run Bazel against those and let the cache take care of redundant builds.

But is it also possible to have some function create those rules 'inside' Bazel? (in order to avoid creating code which is hard to understand and hard to debug, IMHO)

Something like

new_rules = [
    genrule(
        name=uid,
        srcs = [],
        outs=[uid + ".out"],
        cmd = "expensive-computation {} -o {}.out".format(uid, uid),
    ) for uid in open("sources-list.txt").readlines()
]

Background: in reality I'm processing a requirements.txt (Python) containing package names, versions and hashes in each line, which can be handled separately. And I'd like to take as much advantage of a remote cache since the handling of each packages takes up to a couple of minutes.

2

There are 2 answers

0
Brian Silverman On

Have you tried it? If some of your outputs are 100% identical, I believe Bazel will avoid re-executing downstream actions that only use the files where were re-generated with identical contents. I've run across this concept in the source code before, it's called "change pruning".

Note that actions consuming these files will need to ensure only the necessary files are declared as inputs. If you use the entire ctx.files.attr_with_the_genrule as inputs to an action in a downstream rule, then that action will be re-executed when any of them change. The trick is to use some other rule (probably a custom one) to pull out only the file(s) you want as inputs to the actions.

0
Benjamin Peterson On

BUILD files may load .bzl files with the load function and list directories (recursively) with glob. Any other I/O is disallowed. So, a snippet like the one in the opening post is precluded.

Workspace rules may be used to work around this by generating a BUILD file. E.g., the pip rules of rules_python, read requirements.txt files and generate BUILD files.