ABSTRACT

There are two basic forms of clustering algorithms: partitional and hierarchical. In their simplest form, partitional algorithms take as input the data set with N members, and a number, k << N and, in all likelihood, k > 2, and return a partition of the data set into k disjoint subsets that represent the clusters of the data set. There are however a vast number of possible partitions for large N and non-trivial k. Even if N = 50 and k = 5, there are 50 choose 5 possible partitions: there are over 2 million such partitions. To get a sense of the explosive nature of the number of possible partitions, say, we have 60 items and think that there are 15 clusters: there are over 53 trillion possible distinct partitions to search exhaustively to find the optimal one.