Big Data Scaling through Metric Mapping | 6

ABSTRACT

This chapter exploits the remarkable simplicity of very high-dimensional spaces. A particular benefit of correspondence analysis is its suitability for carrying out an orthonormal mapping, or scaling, of power law distributed data. The mean random projection approximates the marginal sum and will be used for seriation, or a one-dimensional mapping, that will then be used as a basis for the clustering. There is little difference from random projections that are normalized. For the normalized random projection values, the scale will differ depending on the normalization used. The non-uniqueness of the seriation or unidimensional scaling that can be the starting point for inducing a hierarchical clustering is a limitation in practice, since many alternatives may be relevant for the hierarchy to be induced. Alternative analysis options in correspondence analysis are available at the early stage of the selection of data to analyse.