I've successfully run a Partitioning Around Medoids using the pam function (cluster package in R) and now, I would like to use the results to attribute new observations to the previously defined clusters/medoids.
Another way to put the problem is, given the k clusters/medoids that have been found by the pam function, which is closer to an additional observation that was not in the initial dataset?
x<-matrix(c(1,1.2,0.9,2.3,2,1.8,
3.2,4,3.1,3.9,3,4.4),6,2)
x
[,1] [,2]
[1,] 1.0 3.2
[2,] 1.2 4.0
[3,] 0.9 3.1
[4,] 2.3 3.9
[5,] 2.0 3.0
[6,] 1.8 4.4
pam(x,2)
Observations 1, 3 and 5, and 2, 4 and 6 are clustered together and observations 1 and 6 are the medoids:
Medoids:
ID
[1,] 1 1.0 3.2
[2,] 6 1.8 4.4
Clustering vector:
[1] 1 2 1 2 1 2
Now, to which cluster/medoid y should be attributed/associated with?
y<-c(1.5,4.5)
Oh, and in case you have several solutions, computing time matters in the big data-set I have.
Try this for k clusters in general:
Extension to any arbitrary distance function: