In the R code below, I would have expected the junk column to have no effect on the SVM calculations, as the formula "true~aaaa+bbbb+cccc" clearly excludes it.
However, deleting the statement "data$junk <- NULL" causes the SVM calculation too crash. The code as it is listed below runs fine. It returns: Parameter tuning of ‘e1071::svm’:
- sampling method: 5-fold cross validation
- best parameters:
gamma cost
0.5 4
- best performance: 0.17
Here is the code:
options( error = function() {
traceback( 2 )
options( error = NULL )
stop( "exiting after script error" )
})
data <- data.frame(
true = as.factor( c( "a", "b", "b", "a", "a", "b", "a", "b", "b", "b", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a", "a", "a" ) ),
aaaa = c( 2, 1, 0, 0, 3, 0, 0, 1, 0, 0, 5, 1, 2, 0, 7, 2, 1, 0, 1, 2, 3, 4 ),
bbbb = c( 4, 0, 0, 0, 4, 1, 4, 1, 0, 0, 6, 1, 1, 4, 0, 2, 0, 2, 1, 2, 3, 4 ),
cccc = c( 3, 2, 0, 0, 3, 1, 4, 1, 2, 0, 0, 7, 1, 3, 5, 2, 2, 1, 1, 2, 3, 4 ),
junk = c(NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
)
data$junk <- NULL # Leave this in or take this statment out.
print( str( data ) )
e1071::tune( e1071::svm,
true~aaaa+bbbb+cccc,
data = data,
type = 'C-classification',
scale = TRUE,
ranges = list( gamma = 2^(-1:1), cost = 2^(2:4) ),
tunecontrol = e1071::tune.control( sampling = 'cross', cross = 5 )
)
Version information
$ Rscript --version
Rscript (R) version 4.3.2 (2023-10-31)
$ R
installed.packages()
e1071 "4.3.2"