ABSTRACT

Modern computer tools allow the collection and the storage of massive datasets, a recent phenomenon commonly referred to as “big data”. The treatment and analysis of these increasingly vast and complex datasets is nowadays one of the biggest challenges for statisticians and data analysts. Consequently, in the last decade there has been a huge activity related to high-dimensional problems. In particular, classification and dimension reduction problems have been considered in many recent papers. Classification techniques such as linear discriminant analysis, support vector machines, tree classifiers and nearest neighbor classifiers are still well used and studied techniques (see, e.g., Paindaveine & Van Bever 2015 and Scornet et al. 2015). In high-dimensional situations, however, those classical techniques tend to perform poorly, as pointed out by Bickel & Levina (2004). As a consequence, new high-dimensional classification methods have recently been brought to the statistical community: Guo et al. (2007) proposed regularized discriminant analysis,