Avoiding oracle, OR separating oracles of the same type

67 views Asked by At

I have the following situation:

  • find-deps is an external program that is very quick to run, and discovers dependency information, similar to ghc -M. Its output is some file deps.
  • compile is an external program that is very slow to run; unlike ghc --make, it is very slow even if none of the inputs have changed.

So the idea is to add a Shake rule that runs find-deps to produce deps, parse it into a list of files srcs, and then the compilation rule would need srcs to ensure that compile is only re-run if any of the sources discovered by find-deps has changed.

The tricky part is that find-deps needs to alwaysRerun, to discover newly-depended-on source files. So now if the compile rule depends on deps to get the list of files, it will also alwaysRerun. The standard solution would be to use an oracle: we can add an oracle that needs deps and parses it into a list of files, and then the compile rule would first ask for that list of source files, and only need them. So there is no alwaysRerun on the need chain of compile.

However, in my case, I am not writing a particular Shakefile. Instead, I am writing a library of reusable Rules that users can use to make their own main Shakefile. So I'd need to package it up as something like

myRules :: FilePath -> Rules ()
myRules dir = do
    dir </> "deps" %> \depFile -> do
        alwaysRerun
        cmd_ (Cwd dir) "find-deps" ["-o", depFile]

    dir </> "exe" %> \exeFile -> do
        srcs <- askOracle $ Sources dir
        need srcs
        cmd_ (Cwd dir) "compile" ["-o", exeFile]

But where would I put the addOracle $ \Sources dir -> ... part that would need [dir </> "deps"] and parse it and return a list of source files? I can't put it in rules, because then two invocations of rules with different directories will try to install an oracle handler two times for the same type. And I can't make dir be part of the oracle question type, because it is a term-level variable so I can't lift it into a Symbol index of the query.

And that leaves me with something super-lame like having a includeThisOnlyOnce :: Rules () that the user has to remember to include exactly once in their Shakefile.

So my question is:

  • Is there a way to track dependencies (i.e. to avoid running compile when no source files have changed) without involving an oracle?
  • Alternatively, is there a way to separate oracles of the same type, by somehow scoping them so that I could add this Sources oracle only to the context of each individual invocation of myRules someDir?
1

There are 1 answers

2
Neil Mitchell On BEST ANSWER

Is there a way to track dependencies without involving an oracle?

Yes - if the output of find-deps doesn't change at all then it won't rebuild compile. You can achieve that by specifying a Change value such as ChangeModtimeAndDigest, but that is a global setting. Alternatively, you can put the output of find-deps somewhere such as foo.deps.out and then call copyFileChanged "foo.deps.out" "foo.deps", which won't update the timestamp if the file hasn't changed.

Is there a way to separate oracles of the same type?

Not easily and immediately, although I can see why its useful. I can think of two potential routes to solve it:

  1. It would be possible to add addOracleIdempotent which ignored any errors about adding the same oracle repeatedly. That's a moderately easy change to Shake (essentially set a flag in Rules to ignore duplicates).
  2. Alternatively, you could try promoting the dir to the type-level and ensuring each oracle has a different type. It probably makes your API more complicated and requires type magic.

Of all these solutions, I'd use copyFileChanged, as its simple and local.