ABSTRACT

This chapter looks at Principal Component Analysis, the most widely-used dimensionality reduction technique. The data scientist is confronted with the task of trying to leverage the mathematician's notions of dimension to help characterize the complexity or information content of a dataset. Dimension gives a measure of how many degrees of freedom are at work in an underlying process that generated the data. The methodology that supports extracting a manifold representation for a particular dataset is often referred to as nonlinear dimension reduction or manifold learning and contains a wide variety of techniques. Algorithm development for understanding data is now well evolved and the resulting toolset is quite powerful. It is our thinking that clues for further development of these algorithms and additional inspiration for new algorithms can be found in the depths of mathematical theory.