R | NA/NaN/Inf in foreign function call | e1071 SVM

2.4k views Asked by At

Dataset: https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29

Code:

require("e1071")

# Load the data into bank
bank <- read.csv("~/R/SVM/kr-vs-kp.data")
colnames(bank)[ncol(bank)] <- 'to.classify'
bank$to.classify <- as.factor(bank$to.classify)

# Divide the data into TRAIN and TEST sets
index <- 1:nrow(bank)
testIndex <- sample(index, trunc(length(index)/3))
testSet <- bank[testIndex,]
trainSet <- bank[-testIndex,]

# Learning sigmoid tuned nu-classification model
svm.nu.tune.model.sigmoid <- best.svm(to.classify ~ ., data = trainSet, coef0 = c(0,1,10,20,30), gamma = c(0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30), cost = c(1,3,10,30,100), nu = c(0.1,0.3,0.5,0.7,0.9), na.action = na.omit, kernel = 'sigmoid',  type = 'nu-classification')
print(svm.nu.tune.model.sigmoid)

Error:

Error in predict.svm(ret, xhold, decision.values = TRUE) : 
  NA/NaN/Inf in foreign function call (arg 8)

The algorithm works fine on any other combination of kernel and type. This is the only problematic one.

2

There are 2 answers

1
Ruthger Righart On

The problem appears to be that you mix the parameters in the same call. If you just use the nu parameter with the nu-classification (and cost parameter with C-classification in another call) then it should work.

For example:

svm.nu.tune.model.sigmoid <- best.svm(to.classify ~ ., data = trainSet, nu = c(0.1,0.3,0.5,0.7,0.9), na.action = na.omit, kernel = 'sigmoid',  type = 'nu-classification')
0
Hadoop On

I finally figured it out.

It seems that for the given dataset some values for nu and coef0 are infeasible and instead of skipping them it decides to throw away the whole work and crash instead.

coef0 - apparently the values are from interval 0 <= coef0 <= 1. Although this information is nowhere to be found. For anything larger than 1 it just crashes with NaN/Inf error.

nu - for anything larger than 0.7 for kr-vs-kp dataset it results into 'nu infeasible' error. For different smaller dataset different values (0 < x < 0.4) are infeasible.

I hope this helps someone.