How to locate individual samples that have been misclassified using kNN, in R?

511 views Asked by At

Using the Iris dataset in R, I am looking at classification using kNN. I am interested in finding the observations that have been misclassified using the test set. I was able to produce scatter plots which gives a visual of the observations that have been misclassified. However, how can I locate and list all the observations that have been misclassified. I have included the code I used to get the scatter plots below which was from https://rpubs.com/Tonnia/irisknn

set.seed(12345)
allrows <- 1:nrow(iris)
trainrows <- sample(allrows, replace = F, size = 0.8*length(allrows))
train_iris <- iris[trainrows, 1:4]
train_label <- iris[trainrows, 5]
table(train_label)
test_iris <- iris[-trainrows, 1:4]
test_label <- iris[-trainrows, 5]
table(test_label)

library(class)
error.train <- replicate(0,30)
for(k in 1:30) {
  pred_iris <- knn(train = train_iris, test = train_iris, cl = train_label, k)
  error.train[k]<-1-mean(pred_iris==train_label)
}

error.train <- unlist(error.train, use.names=FALSE)

error.test <- replicate(0,30)
for(k in 1:30) {
  pred_iris <- knn(train = train_iris, test = test_iris, cl = train_label, k)
  error.test[k]<-1-mean(pred_iris==test_label)
}

error.test <- unlist(error.test, use.names = FALSE)

plot(error.train, type="o", ylim=c(0,0.15), col="blue", xlab = "K values", ylab = "Misclassification errors")
lines(error.test, type = "o", col="red")
legend("topright", legend=c("Training error","Test error"), col = c("blue","red"), lty=1:1)

pred_iris<-knn(train = train_iris, test = test_iris, cl = train_label, 6)
result <- cbind(test_iris, pred_iris)
combinetest <- cbind(test_iris, test_label)

result%>%
  ggplot(aes(x=Petal.Width, y=Petal.Length, color=pred_iris))+
  geom_point(size=3)

combinetest%>%
  ggplot(aes(x=Petal.Width, y=Petal.Length, color=test_label))+
  geom_point(size=3)
1

There are 1 answers

0
Kezrael On BEST ANSWER

In your code, pred_iris holds the value for the current trained model response.

Once you have the combinetest data, around the end of your code, you could do something like:

combinetest[test_label != pred_iris,]

To get the ones with a different prediction than label.

Alternatively, with a more tidyverse readable syntax:

library(tidyverse)
combinetest %>%
    filter(test_label != pred_iris)