I want to use the step_impute_knn function from the recipe package to impute missing values. This function uses the Gower distance as a distance metric, which is suitable when predictors are a mixture of categorical and continuous data. But as far as I can see, there is no way to use this function with the tune() parameter, since the tuning must be done on a (parsnip) model. But the only parsnip model is nearest_neighbor function that doesn't have Gower distance as an option.
Sample data:
train <- structure(list(PassengerId = c("0001_01", "0002_01", "0003_01",
"0003_02", "0004_01", "0005_01"), HomePlanet = c("Europa", "Earth",
"Europa", "Europa", "Earth", NA), CryoSleep = c("False",
"False", "False", "False", "False", "False"), Cabin = c("B/0/P",
"F/0/S", "A/0/S", "A/0/S", "F/1/S", "F/0/P"), Destination = c("TRAPPIST-1e",
"TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "PSO J318.5-22"
), Age = c(39, 24, 58, 33, 16, 44), VIP = c("False", "False",
"True", "False", "False", "False"), RoomService = c(0, 109, 43,
0, 303, 0), FoodCourt = c(0, 9, 3576, 1283, 70, 483), ShoppingMall = c(0,
25, 0, 371, 151, 0), Spa = c(0, 549, 6715, 3329, 565, 291), VRDeck = c(0,
44, 49, 193, 2, 0), Name = c("Maham Ofracculy", "Juanna Vines",
"Altark Susent", "Solam Susent", "Willy Santantines", "Sandie Hinetthews"
), Transported = c("False", "True", "False", "False", "True",
"True")), row.names = c(NA, 6L), class = "data.frame")
What I have so far:
train_no_na <- train %>%
na.omit()
imp_knn_blueprint <- recipe(Transported ~ ., data = train_no_na) %>%
step_impute_knn(recipe = ., HomePlanet,
impute_with = imp_vars(.), neighbors = 5,
options = list(nthread = 1, eps = 1e-08))
imp_knn_prep <- prep(imp_knn_blueprint, training = train_no_na)
imp_knn_5 <- bake(imp_knn_prep, new_data = train)
Is there some way to use the tidymodels and parsnip workflows to tune the knn-function that is used inside the step_impute_knn? I've tried reading the code for the function but don't see which engine they use.
EDIT: To be clear, I'd like to tune the neighbours parameter inside step_impute_knn via some grid search, rather than having to do it manually.
You can
tune()neighbors instep_impute_knnsimilarly to other hyperparameters in recipe steps.