Fuzzy clustering in python

438 views Asked by At

I am trying to make documents fuzzy clustering. The idea is to get a membership score for each document into each cluster.

I have computed the TF-IDF matrix for the entire corpus and then, and I have attempted to use the cmeans clustering from fuzzy-sklearn, but it results in a memberships matrix with equal values for each element.

import pandas as pd
import skfuzzy as fuzz

data = [ 
          [0.789, 0.45, 0, 0, 0.2],
          [0, 0.125, 0, 0.1, 0.4],
          [0.789, 0.45, 0, 0, 0],
          [0.9, 0.785, 0.123, 0, 0.2],
          [0, 0, 0.3, 0.5, 0.1] # goes on....
       ]
dist_matrix = pd.DataFrame(data)

data = dist_matrix.to_numpy()

num_clusters = 14
cntr, u, _, _, _, _, _ = fuzz.cluster.cmeans(data, num_clusters, 2, error=0.005, maxiter=1000)

What I am missing?

EDIT: I have inserted the MRE. Let's say that my dataset actually has 9k rows and close 2k columns. And I would like to get a matrix 'u', output for fuzzy-c-means like the following:

     1      2      3     4    .....    13
 0   0.3    0     0.2    0    .....     0
 1   0.45   0.3     0    0    .....     0
                    .....
9k    0     0       0    0    .....     0

With a row for each document and the ratio of membership in each of the 14 clusters.

0

There are 0 answers