Imagine there is a formula for simple regression: y~f1+f2+f3, where f1 is a factor with A,B,C levels. f2 and f3 are numerics Further i'm using following recipe:
recipe(y~f1+f2+f3, data) %>%
step_dummy(f1) %>%
step_log(f3)
Question. Eventually initial formula turns to y~f1_A+f1_B+f1_C+f2+log(f3)
, right?
Question2. If I would have added
+step_pca(comp5)
it would become
y~PC1+PC2+..PC5
?
Hope it make sense
Thanks in advance
For the first question
Almost! The log step renames the variable (so the logged variables are just in column
f3
). The other parts are right.Question 2:
Yes(ish). The names that come out of
step_pca()
are designed to be sortable. If you have fewer than 10 components, then the above is right. If you have 11 to 99 components, then they arePC01
...PC99
.Finally, recipes don't just make a formula to do these computations (you probably didn't mean that but just to be sure). However, there is a little-known formula method that you can use on the recipes once it is prepared:
Created on 2023-10-28 with reprex v2.0.2