The website I am trying to run the code is using an old version of R and does not accept ranger as the library. I have to use the caret package. I am trying to process about 800,000 lines in my train data frame and here is the code I use
control <- trainControl(method = 'repeatedcv',
number = 3,
repeats = 1,
search = 'grid')
tunegrid <- expand.grid(.mtry = c(sqrt(ncol(train_1))))
fit <- train(value~.,
data = train_1,
method = 'rf',
ntree = 73,
tuneGrid = tunegrid,
trControl = control)
Looking at previous posts, I tried to tune my control parameters, is there any way I can make the model run faster? Am I able to specify a specific setting so that it just generates a model with the parameters I set, and not try multiple options?
This is my code from ranger which I optimized and currently having accurate model
fit <- ranger(value ~ .,
data = train_1,
num.trees = 73,
max.depth = 35,mtry = 7,importance='impurity',splitrule = "extratrees")
Thank you so much for your time
When you specify
method='rf'
,caret
is using therandomForest
package to build the model. If you don't want to do all the cross-validation thatcaret
is useful for, just build your model using therandomForest
package directly. e.g.You can specify values for
ntree
,mtry
etc.Note that the
randomForest
package is slow (or just won't work) for large datasets. Ifranger
is unavailable, have you tried theRborist
package?