Given a dataset with 23 points spread out over 6 dimensions, in the first part of this exercise we should do the following, and I am stuck on the second half of this:
- Compute the first step of the CLIQUE algorithm (detection of all dense cells). Use three equal intervals per dimension in the domain 0..100,and consider a cell as dense if it contains at least five objects.
Now this is trivial and simply a matter of counting. The next part asks the following though:
- Identify a way to compute the above CLIQUE result by only using the functions of Weka provided in the tabs of Preprocess, Classify , Cluster , or Associate . Hint : Just two tabs are needed.
I've been trying this for over an hour now, but I can't seem to get anywhere near a solution here. If anyone has a hint, or maybe a useful tutorial which gives me a little more insight into weka it would be very much appreciated!
I am assuming you have 23 instances (rows) and 6 attributes (dimensions)
Use pre-process tab to discretize your data to 3 equal bins. See image or command line. You use 3 bins for intervals. You may choose to change useEqualFrequency to false and true and try again. I think true may give better results.
weka.filters.unsupervised.attribute.Discretize -B 3 -M -1.0 -R first-last
After that cluster your data. This will give show you near instances. Since you would like to find dense cells. I think SOM may be appropriate.
You have 23 instances. Therefore try for 2x2=4 cluster centers, then go for 2x3=6,2x4=8 and 3x3=9. If your data points are near. Some of the cluster centers should always hold 5 instances no matter how many cluster centers your choose.