I understand that HAC has several options in terms of linkage functions. You have:
- Single linkage which produces "straggly" clusters
- Complete linkage which produces tight, spherical clusters
- Average linkage which is sort of a compromise between the two
- Ward's method, which is based more off the variance than actual distance
What I'm trying to figure out is, how do I know which one of these I want to use? Are there certain datasets where "straggly" clusters are preferable to spherical ones? Or is it more a function of what I intend to do with the clustering data?
It depends on your data.
Single-linkage works reasonably well on clean data.
If you have dirty data, the other linkages may be better.
Ward is similar to k-means. It may be a good choice if you want to talk about centroids and data partitioned completely into disjoint subsets.
The other problem is that only SLINK (for single-linkabe) is fast. All the others usually work in O(n^3) so they are not usable on large data sets. Compare this to e.g. DBSCAN which runs in O(n log n) if done well, or kmeans in O(n)...