I am very curious about snakemake but I'm not sure it fits my use case, because I have humans in the loop.
My process is something like this:
- Start with a baseline binary classification model
- Generate 100 examples near the margin (predicted probability near 0.5)
- Have humans label those 100 examples.
- Add the 100 examples to the data set and retrain.
- Goto step 1.
Thus, it's a form of active learning with humans-in-the-loop
Is snakemake a good fit for this? Or is the human-in-the-loop confounding the principle of reproducibility? If I should use snakemake, are there any relevant pointers for something similar?
You can achieve this by imagining each loop as a distinct Snakemake output:
So, yes, I think Snakemake is a good fit for your process because it can represent it with reproducibility at and for each loop.