what is the appropriate method to cluster binary matrix

Question

what is the appropriate method to cluster binary matrix

2.7k views Asked by Y.sarra At 29 May 2018 at 08:53

I am a beginner in clustering, and I have a binary matrix in which each student have the sessions they are enrolled in. I want to cluster students with same sessions.

clustering methods are so many and varies according to the dataset

for exemple k-means is not appropriate, because the data is binary and the standard "mean" operation does not make much sense for binary.

i'm open to any suggestion

Here's an example:

+------------+---------+--------+--------+
|  session1  | session2|session3|session4|
+------------+---------+--------+--------+
|     1      |    0    |   1    |    0   |
|     0      |    1    |   0    |    1   |
|     1      |    0    |   1    |    0   | 
|     0      |    1    |   0    |    1   |
+------------+---------+--------+--------+

Result:

clusterA = [user1,user3]

clusterB = [user2,user4]

Original Q&A

There are 1 answers

**knb** · Accepted Answer · 2018-05-29T14:43:45+00:00

You could use the Jaccard distance for each pair of points.

In R:

# create data table
mat = data.frame(s1 = c(T,F,T,F), s2 = c(F,T,F,T), 
                 s3 = c(T,F,T,F), s4 = c(F,T,F,T))

Result:

     s1    s2    s3    s4
1  TRUE FALSE  TRUE FALSE
2 FALSE  TRUE FALSE  TRUE
3  TRUE FALSE  TRUE FALSE
4 FALSE  TRUE FALSE  TRUE

 dist(mat, method="binary") # jaccard distance

Result:

Row 3 has a distance of 1 from row 4. By chance, the distances are all exactly 1 and 0 here. These are actually floats. (Your toy dataset may be too simplistic here)

Cluster them:

hclust(dist(mat, method="binary"))

Result (no so informative):

Call:
hclust(d = dist(mat, method = "binary"))

Cluster method   : complete 
Distance         : binary 
Number of objects: 4

Create dendrogram plot

plot(hclust(dist(mat, method="binary")))

TechQA.

what is the appropriate method to cluster binary matrix

There are 1 answers

Related Questions in CLUSTER-COMPUTING

Related Questions in CLUSTER-ANALYSIS

Related Questions in BINARY-MATRIX

Popular Questions

Popular Tags

Trending Questions