ABSTRACT

Data analysis is often the first step to be performed after raw data leave an instrument. The raw data are usually high-dimensional-dimensionality here refers to the number of outputs of one measurement and noisy. We also include hand-crafted feature selection methods here; although they are not considered mainstream dimensionality-reduction tools, they can be very powerful in reducing the amount of data because the user can create them with their full domain expertise in mind. Of course, in the interest of generality, one could always assume non-linearity and discover linearity; however, non-linear dimensionality reduction is much more cumbersome, so it often proves useful to make a pre-selection. Examples of these kinds of algorithms are classification, clustering, and other operators whose output is not a subset of the original input space the authors will loosely call these methods "descriptor methods"; they may not be considered mainstream dimensionality reduction methods.