Format of predictions is invalid. It couldn't be coerced to a list Error in r

1.6k views Asked by At

I am using ranger to fit random forest. As evaluation metric, I am using roc-auc-score, by cvAUC. After making predictions, when I try to evaluate the auc score, I get an error: Format of predictions is invalid. It couldn't be coerced to a list. I think this is due to predictions consisting a Level part which shows the unique levels for predictions. However, I could not get rid of that part. The minimum reproducible example is below, that throws the error:

library(caret)
install.packages("cvAUC")
library(cvAUC)

# Columns for training set
cat.column <- c("cat", "dog", "monkey", "shark", "seal")
num.column <- c(1,2,5,7,9)
class <- c(0,1,0,0,1)

train.set <- data.frame(num.column, cat.column, class)

# Columns for test set
cat.column <- c("cat", "elephant-shrew", "monkey", "monkey", "seal")
num.column <- c(1,11,5,6,8)
class <- c(1,0,1,0,1)

test.set <- data.frame(num.column, cat.column, class)

# Drop the target variable from the test set
target.test <- test.set["class"]
test.set <- test.set[,!names(test.set) %in% "class"]

# Fit random forest
rf = ranger(formula = as.factor(class) ~ . , data = train.set, verbose = FALSE)
# Get predictions
pred <- predict(rf, test.set)
predictions <- pred$predictions

# Get AUC score
auc <- AUC(as.factor(predictions), as.factor(unlist(target.test)), label.ordering = NULL)

cat(auc)

1

There are 1 answers

0
Elia On BEST ANSWER

you get the error because AUC is expecting a numeric vector not a factor. However, in this example, in the test set appears a new level in the column cat.column (elephant-shrew). It is good to enter all the possible values that a variable can assume both in the training and in the test set.

library(caret)
library(cvAUC)
library(ranger)
# Columns for training set
cat.column <- c("cat", "dog", "monkey", "shark", "seal")
num.column <- c(1,2,5,7,9)
class <- factor(c(0,1,0,0,1),levels = c(0,1))

train.set <- data.frame(num.column, cat.column, class,stringsAsFactors = F)
# Columns for test set
cat.column <- c("cat", "elephant-shrew", "monkey", "monkey", "seal")
num.column <- c(1,11,5,6,8)
class <- factor(c(1,0,1,0,1),,levels = c(0,1))

test.set <- data.frame(num.column, cat.column, class,stringsAsFactors = F)

# Drop the target variable from the test set
target.test <- test.set["class"]
test.set <- test.set[,!names(test.set) %in% "class"]

# Fit random forest
rf = ranger(formula = class ~ . , data = train.set, verbose = FALSE)
# Get predictions
pred <- predict(rf, test.set)
predictions <- pred$predictions

# Get AUC score
auc <- AUC(as.numeric(predictions), target.test$class, label.ordering = NULL)
cat(auc)

As you can see I slightly change the data preparation steps. First, if your class column is the outcome of a classification task it is better to coerce it to factor ASAP. Second, if the test set doesn't contain all the values of a character variable (such in your example, in which the column cat.column contain elephant-shrew that is not contained in the training set) it is better to handle that variable as a character (in this case you can use the stringAsFactor=F to keep character variable as character