I am using the Ames Housing Data and I want to use all the variables with the suffix "SF" in my recipe, I want to use step_pca() on the variables that are measure by squared feet.
I used reformulate() to no avail:
SF <- reformulate(grep("SF", names(ames), value = TRUE),
response = 'Sale_Price')
simple_ames <-
recipe(SF + Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + Latitude,
data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact(~ Gr_Liv_Area:starts_with('Bldg_Type_')) %>%
step_ns(Latitude, deg_free = 20) %>%
step_pca(matches('(SF$)|(Gr_Liv'))
Also used grep() directly into the formula
simple_ames <-
recipe(Sale_Price ~ paste(grep("SF"), collapse = '+') + Neighborhood +
Gr_Liv_Area + Year_Built + Bldg_Type + Latitude, data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact(~ Gr_Liv_Area:starts_with('Bldg_Type_')) %>%
step_ns(Latitude, deg_free = 20) %>%
step_pca(matches('(SF$)|(Gr_Liv'))
I am using the examples from Tidy Modelling with R, https://www.tmwr.org/recipes chapter 8.4.4 (authors do not explain a efficient way to insert all those variables into recipe)
Thanks
In the recipes package only the selector functions from recipes and tidyselect are allowed. Custom functions will fail. To do what you want to do try:
If you insist on a regex, remember that tidyselect
matches(which recipes utilises) uses stringr style regex and the following should work:You can always test your selector if you use tidyselect selectors within recipes by actually applying it to the data you are using e.g.
ames |> select(matches("SF$|Gr_Liv")). That helps to make sure that you operate on the predictors you want.See also
?recipes::selectionsfor a more thorough explanation.