I have a drake plan which uses a input folder with file_in
. Then reads each file inside the folder and makes a number of transformations. Finally, it joins the results.
If I add a new file, I would like that the new calculations in plan are only applied to this file, and then joined to the previous results. However, what the plan does is: it detects a change in target, then recalculates all targets based on that target.
Note: The number of files is quite large (several thousands), and calculations heavy.
Solution (look at landau's solution for a better solution)
This solution completes the answer I marked as solution:
Any file or directory you declare with file_in() or target(format = "file") is treated as an irreducible unit of data, and this behavior in drake will not change in future development. But you can split up the files among multiple targets so some targets stay up to date if a file changes.
library(drake)
drake_plan(
input = target(list.files(file_in("/path/to/folder")),format="file"),
target1 = target(do_stuff1(input), dynamic=map(input))
)
This will make dynamic targets, and therefore the new files will create new dynamic targets, but the old target will not be re-calculated.
Any file or directory you declare with
file_in()
ortarget(format = "file")
is treated as an irreducible unit of data, and this behavior indrake
will not change in future development. But you can split up the files among multiple targets so some targets stay up to date if a file changes.Created on 2020-09-04 by the reprex package (v0.3.0)
With dynamic branching
Dynamic branching over files is trickier, and
file_in()
is for static targets only. Even then, it may be suboptimal to create a dynamic sub-target for every single file because you have thousands of them. It is probably better to batch files into groups and give each group to a sub-target. But if you still want to dynamically branch over every single file, here is the way to do it that ensures each file is properly reproducibly tracked for changes.Created on 2020-09-17 by the reprex package (v0.3.0)
This is slightly easier in
targets
due totarchetypes::tar_files()
.