I'm using tidymodels to tune binary classification randomForest models using a moderately imbalanced dataset, with an approximately 1:7 positive to negative ratio in the target variable, that will bias towards the majority class without sub sampling. So, I'm exploring a range of under_ratio values with step_downsample in recipes to address this imbalance. Many of the best performing models based on pr_auc and roc_auc suggest some degree of down sampling (around under_ratio=5) would improve model generalizability. I am currently setting the random seed using the 'seed' argument within step_downsample, and—if I understand correctly—this fixes the instances of the majority class that are selected by step_downsample within that particular fold. I know that eventually all majority class instances will be represented in the training set via 10 fold cv, but I'm also using 5 repeats in my resample.
My questions are: 1) Is my understanding of setting random seed within a recipe step that is then repeated across 10 folds and 5 repeats, i.e., that operations such as step_downsample would reselect the same random subset of the majority class for each repeat, accurate? 2) Would there be any advantage to allowing a different subset of majority class instances to be selected within each of the 10 folds during the repeats? And finally, 3) What are the general guidelines for ensuring reproducibility within a tidymodels framework that includes an initial data split to train/test, recipe steps with random down sampling, a resampling object [(group_vfold_cv(v=10, repeats=5)], the use workflow_map with "tune_grid" and a regular grid of the preliminary parameter ranges, and finally using tune_bayes on the set of workflows rated highest by several metrics including pr_auc and roc_auc?
Everything currently runs fine and I am getting results that seem to make sense given the limitations of my dataset, but I'm wondering if I could gain anything by introducing another (reproducible) level of randomness to my process.
Thank you all for any insights or clarifications you can provide.