I am trying to make documents fuzzy clustering. The idea is to get a membership score for each document into each cluster.
I have computed the TF-IDF matrix for the entire corpus and then, and I have attempted to use the cmeans clustering from fuzzy-sklearn, but it results in a memberships matrix with equal values for each element.
import pandas as pd
import skfuzzy as fuzz
data = [
[0.789, 0.45, 0, 0, 0.2],
[0, 0.125, 0, 0.1, 0.4],
[0.789, 0.45, 0, 0, 0],
[0.9, 0.785, 0.123, 0, 0.2],
[0, 0, 0.3, 0.5, 0.1] # goes on....
]
dist_matrix = pd.DataFrame(data)
data = dist_matrix.to_numpy()
num_clusters = 14
cntr, u, _, _, _, _, _ = fuzz.cluster.cmeans(data, num_clusters, 2, error=0.005, maxiter=1000)
What I am missing?
EDIT: I have inserted the MRE. Let's say that my dataset actually has 9k rows and close 2k columns. And I would like to get a matrix 'u', output for fuzzy-c-means like the following:
1 2 3 4 ..... 13
0 0.3 0 0.2 0 ..... 0
1 0.45 0.3 0 0 ..... 0
.....
9k 0 0 0 0 ..... 0
With a row for each document and the ratio of membership in each of the 14 clusters.