ABSTRACT

There are many applications in which one may want to cluster both rows and columns. For instance, in DNA microarray analysis the observed expression of p genes on n slides is recorded, and while clustering the genes is of primary interest, clustering the slides leads to identification of groups among the patients and is useful as well. Other applications include marketing (for instance, clustering customers and goods), biology, psychology, sociology. Following arguments that date back at least to Fisher (1969) and Hartigan (1972), it can be argued that the most appropriate route in these cases is to perform simultaneous clustering of rows and columns. This is called biclustering or double clustering in the statistical literature.