R: Is it possible to use centroids in data frame format to classify

161 views Asked by At

A scientific publication published a pancreatic cancer classifier and I want to use this classifier on my own expression set. The only information that they provide is a data frame with centroids (rows: genes x columns: subtypes)(https://doi.org/10.1053/j.gastro.2018.08.033, supplementary table 2). Up until now I haven’t figured out to reproduce this classification model for prediction.

All packages that I found, they calculate the centroids from expression data and labels, and output a models to predict a new set. Unfortunately the labels are not published with this article; recalculating the centroids is not possible.

Question: How can I use centroids to classify an other expression set?

1

There are 1 answers

0
G5W On

You can use k-Nearest Neighbors with only the centroids. Just use the centroids as the training data and k = the number of centroids. Since you do not provide any data, I will give an example using the iris data. The specific centroids don't matter here, but they must be in a data frame with the same format as the data that you wish to classify. You can call the classes whatever you want. I just called them A,B and C.

## Define some centroids
Centroids = aggregate(iris[,1:4], list(iris$Species), mean)[,-1]

library(class)
knn(train=Centroids, test = iris[,1:4], k=3, cl=c("A", "B", "C"))
  [1] B A A B C B C B C A B C C A A B C B C B A B C B C B B B B C B B C B A A A
 [38] A B B B B B A A B B C A A A B C C B A C B C C C B A B B C C A B A B B C C
 [75] A A C C B C C A B C C C B B C A C A C A B A A A B B A C C A C B B B C B A
[112] A B C B A C A B B A B B C B B C A A B A A B A C A B C B B A B C C A B A C
[149] B C
Levels: A B C