I have the following code comparing the rmse of models that differ only in the polynomial term.
library(tidyverse)
data(mtcars)
cv_mtcars = mtcars %>%
crossv_kfold(k = 10)
cv_mtcars %>%
mutate(model1 = map(train, ~lm(disp ~ wt, data = .)),
model2 = map(train, ~lm(disp ~I(wt^2), data = .)),
model3 = map(train, ~lm(disp ~I(wt^3), data = .)),
model4 = map(train, ~lm(disp ~I(wt^4), data = .)),
model5 = map(train, ~lm(disp ~I(wt^5), data = .)),
model6 = map(train, ~lm(disp ~I(wt^6), data = .)),
order1 = map2_dbl(model1, test, ~rmse(.x, .y)),
order2 = map2_dbl(model2, test, ~rmse(.x, .y)),
order3 = map2_dbl(model3, test, ~rmse(.x, .y)),
order4 = map2_dbl(model4, test, ~rmse(.x, .y)),
order5 = map2_dbl(model5, test, ~rmse(.x, .y)),
order6 = map2_dbl(model6, test, ~rmse(.x, .y))) %>%
select(order1,order2,order3,order4,order5,order6) %>% gather(1:6,key=model,value=value) %>%
ggplot()+
geom_point(aes(x=factor(model),y=value))+
labs(y="rmse",x="polynomial",title="Model Assesment",subtitle="disp~I(wt^x)")
Is there a more efficient way to iterate through my models? I feel like I am writing more code than I need to.
You can iterate through the models with an outer call to
map
to iterate over the polynomial orders and an inner call tomap
to iterate over the 10 folds. In the code below, I've usedpoly(wt, i)
instead ofI(wt^i)
, becauseI(wt^i)
generates a polynomial with only the highest-order term, whilepoly(wt, i)
generates a polynomial with terms of all orders up to the highest order. I've saved thermse
for each fold in themodel_cv
object, but you can, of course, pipe it directly into ggplot instead.