Clustering of Modal Valued Data

doi:10.1201/9781315370545-7

ABSTRACT

Adapted versions of k-means and hierarchical clustering for modal valued data are presented. All symbolic variable values (partitionings) are treated as unordered categories; and represented as empirical probabilities with known totals/sizes. This type of representation allows the adapted algorithms to use totals/sizes (or more generally sums of all category values) as weights. We show that when the (weighted) squared Euclidean dissimilarity measure is applied to the adapted methods, the final cluster representatives are also composed of distributions of variable values in these clusters. The clusters are therefore easily interpretable and can be further quantified by using the cluster specificity index for each symbolic variable to compare cluster representative with the representative of the entire dataset. In addition, characteristic components of each symbolic variable in the cluster can be quantified using the contrast index.

The adapted method is applied to the TIMSS data, and the results are interpreted using the newly defined cluster specificity and contrast indices.