Imagine a Bazel based build workflow which lets a genrule()
take the content of a file (say sources-list.txt
) and creates files for each line of sources-list.txt
. Now you also know that each line and it's output are independent from all other, i.e. every line might also define an independent build rule. So if only a few lines change in a large sources-list.txt
you'd be better off creating the output for only those lines which have changed to avoid unnecessary build cost.
With Bazel and genrule()
alone an approach to achieve this would be to preprocess sources-list.txt
(before bazel build
) to create BUILD
files for each line and run Bazel against those and let the cache take care of redundant builds.
But is it also possible to have some function create those rules 'inside' Bazel? (in order to avoid creating code which is hard to understand and hard to debug, IMHO)
Something like
new_rules = [
genrule(
name=uid,
srcs = [],
outs=[uid + ".out"],
cmd = "expensive-computation {} -o {}.out".format(uid, uid),
) for uid in open("sources-list.txt").readlines()
]
Background: in reality I'm processing a requirements.txt
(Python) containing package names, versions and hashes in each line, which can be handled separately. And I'd like to take as much advantage of a remote cache since the handling of each packages takes up to a couple of minutes.
Have you tried it? If some of your outputs are 100% identical, I believe Bazel will avoid re-executing downstream actions that only use the files where were re-generated with identical contents. I've run across this concept in the source code before, it's called "change pruning".
Note that actions consuming these files will need to ensure only the necessary files are declared as inputs. If you use the entire
ctx.files.attr_with_the_genrule
as inputs to an action in a downstream rule, then that action will be re-executed when any of them change. The trick is to use some other rule (probably a custom one) to pull out only the file(s) you want as inputs to the actions.