I can calculate cluster membership with KMeans
pretty easily:
open System
open System.IO
open Utils
open Accord
open Accord.Math
open Accord.MachineLearning
let vals = [|
[|1.0; 2.0; 3.0; 2.0|]
[|1.1; 1.9; 3.1; 4.0|]
[|2.0; 3.0; 4.0; 4.0|]
[|3.0; 3.1; 2.0; 3.0|]
[|2.0; 4.0; 3.0; 6.0|]
[|1.0; 5.0; 5.0; 7.0|]
[|4.0; 3.0; 6.0; 8.0|]
[|5.0; 4.0; 3.0; 6.0|]
[|6.0; 4.0; 8.0; 7.0|]
[|5.0; 6.0; 5.0; 9.0|]
[|4.0; 2.0; 7.0; 8.0|]
[|8.0; 9.0; 3.1; 2.2|]
[|8.0; 9.0; 2.0; 2.0|]
[|10.0; 2.0; 3.0; 2.0|]
[|10.1; 1.9; 3.1; 4.0|]
[|20.0; 3.0; 4.0; 4.0|]
[|22.0; 7.0; 2.0; 3.0|]
[|21.0; 4.0; 3.0; 6.0|]
|]
let kmeans = new KMeans 5
let clusterModel = kmeans.Learn vals
let clusters = clusterModel.Decide vals
Can I calculate partial membership with the standard KMeans
algorithm? A coworker suggested using the mean and variances of the cluster members to determine proportional membership and today I've been looking into fuzzy sets and their implementations for F#
. For example, here is some documentation for the Accord.net implementation for fuzzy sets. I can translate/run the example for F#
but at first glance, I don't see an easy way to get the data from my Kmeans
run above to fit the format of assigning partial membership.
Questions:
How would I use the mean/variance of cluster members to calculate partial membership?
Is there an easy way to calculate partial membership with
KMeans
clustering with the Accord.net library?The KMeans algorithm in Accord.net is simple to implement; should I spend some time trying to learn this method of clustering/membership to suit my problem rather than try and force Kmeans clustering to suit my needs?
As mentioned by Tomas, Accord.NET already gives you many of the building blocks. In particular, calling
clusterModel.Scores
gives you the (negative) distances to the cluster centroids, see source codeFrom the negative distances, you can compute an approximate class membership score by taking exponentials, similar to what you would do to compute a Gaussian PDF. In F#, that would look like:
There are a couple of caveats here:
Regarding your third question: I would not re-implement. It may seem easy initially, but there are usually plenty of corner cases and stability issues that you only run into after some time.