I have come to value the {targets} package for keeping doing the janitorial work on calculation pipelines.
Thanks, Will!
Because of requests for external access to the results of my pipeline, I want to make results of the pipeline available as a MySQL database, and I'm just starting to explore this space. Are there best practices for using {targets} with a database? (I have done quite a bit of searching without success, but maybe I missed something obvious.)
Here are some features of my use case:
- Modifications to the MySQL database will be made by the {targets} pipeline only.
- Others will access the MySQL database with read-only priviledges.
- I would like each target of the {targets} pipeline to be a different table in the database.
- Later targets in the {targets} pipeline should be able to access earlier targets (as database tables).
- The {targets} pipeline is highly parallelizable, so static and dynamic branch targets should have access to the appropriate subset of the rows of database tables without first reading the entire table and second filtering.
In short, I would like to use a database instead of the file system as the backing store for a {targets} pipeline in which subsequent targets can efficiently parallelize across previous targets (stored as database tables).
I know I could re-load the data frames from the .rds files and store them in the database at the end of my pipeline, but the .rds files are large (100s of MB), and reading/writing those files takes a surprisingly large portion of the runtime of the {targets} pipeline. I'm hoping to avoid that overhead.
Finally, I would like to use https://docs.dolthub.com/products/doltlab/installation as the database server to enable git-like versioning and push/pull of the database. (I'm hoping to work locally in a development branch, merge into a production branch, and push to a DoltLab server for deployment. Other, read-only, users of the database could "pull" the database, too, for faster local access.)
Thanks in advance for any pointers or suggestions
for how this would work well or poorly.
Even short responses such as
- "that won't work; you should instead try ..." or
- "take a look at
tar_hook_before()andtar_hook_after()"
will be helpful.