Jackalopes and Hares: Clustering | 5 | Data Science and Analytics with

ABSTRACT

A cluster is a group of similar data points, and the concept of similarity is therefore important. This chapter presents some important algorithms that enable us to cluster hares and jackalopes, as well as rabbits and stags. Clustering provides us with a layer of abstraction from individual data points to collections of them that share similar characteristics. Clustering enables us to enhance our knowledge of our datasets. Cluster validation can be further used to identify clusters that should be split or merged, or to identify individual points with disproportionate effect on the overall clustering. Cluster validation is an important part of the process to determine the effectiveness of the algorithm. The overall clustering validity measure combines both cohesion and separation. The silhouette coefficient combines the ideas behind cohesion and separation.