I'm using recipe()
function in tidymodels
packages for imputation missing values and fixing imbalanced data.
here is my data;
mer_df <- mer2 %>%
filter(!is.na(laststagestatus2)) %>%
select(Id, Age_Range__c, Gender__c, numberoflead, leadduration, firsttouch, lasttouch, laststagestatus2)%>%
mutate_if(is.character, factor) %>%
mutate_if(is.logical, as.integer)
# A tibble: 197,836 x 8
Id Age_Range__c Gender__c numberoflead leadduration firsttouch lasttouch
<fct> <fct> <fct> <int> <dbl> <fct> <fct>
1 0010~ NA NA 2 5.99 Dealer IB~ Walk in
2 0010~ NA NA 1 0 Online Se~ Online S~
3 0010~ NA NA 1 0 Walk in Walk in
4 0010~ NA NA 1 0 Online Se~ Online S~
5 0010~ NA NA 2 0.0128 Dealer IB~ Dealer I~
6 0010~ NA NA 1 0 OB Call OB Call
7 0010~ NA NA 1 0 Dealer IB~ Dealer I~
8 0010~ NA NA 4 73.9 Dealer IB~ Walk in
9 0010~ NA Male 24 0.000208 OB Call OB Call
10 0010~ NA NA 18 0.000150 OB Call OB Call
# ... with 197,826 more rows, and 1 more variable: laststagestatus2 <fct>
here is my codes;
mer_rec <- recipe(laststagestatus2 ~ ., data = mer_train)%>%
step_medianimpute(numberoflead,leadduration)%>%
step_knnimpute(Gender__c,Age_Range__c,fisrsttouch,lasttouch) %>%
step_other(Id,firsttouch) %>%
step_other(Id,lasttouch) %>%
step_dummy(all_nominal(), -laststagestatus2) %>%
step_smote(laststagestatus2)
mer_rec
mer_rec %>% prep()
it just works fine until here ;
Data Recipe
Inputs:
role #variables
outcome 1
predictor 7
Training data contained 148377 data points and 147597 incomplete rows.
Operations:
Median Imputation for 2 items [trained]
K-nearest neighbor imputation for Id, ... [trained]
Collapsing factor levels for Id, firsttouch [trained]
Collapsing factor levels for Id, lasttouch [trained]
Dummy variables from Id, ... [trained]
SMOTE based on laststagestatus2 [trained]
but when ı run bake()
function that gives error says;
mer_rec %>% prep() %>% bake(new_data=NULL) %>% count(laststagestatus2)
Error: Please pass a data set to `new_data`.
Could anyone help me about what I m missing here?
There is a fix in the development version of recipes to get this up and working. You can install via:
Then you can
bake()
withnew_data = NULL
to get out the transformed training data.Created on 2020-10-12 by the reprex package (v0.3.0.9001)
If you are unable to install packages from GitHub, you could use
juice()
to do the same thing.