I always finish up my model to fit and predict without using prep()
, bake()
, or juice()
:
rec_wflow <-
workflow() %>%
add_model(lr_mod) %>%
add_recipe(rec)
data_fit <-
rec_wflow %>%
fit(data = train_data)
Are these ( prep
, bake
, juice
) functions only used to visually check the preprocessing results of the data and not necessary for the fitting/training process?
What is the difference among prep/bake/juice in the R package "recipes"?
The above code is how I learned it in the official tutorial.
I've read in another blog that if you use train_data
, data leakage is generated. I'd like to hear more about that; are these functions related to data leakage?
Short answer: you are correct, when a recipe is used in a workflow as in your example, the pre-processing functions are not required.
This is touched on in the tutorial Handle class imbalance in #TidyTuesday climbing expedition data with tidymodels:
I recommend all the tutorials at Julia's blog for understanding tidymodels.