ABSTRACT

The chapter describes a nonparametric method of multivariate data analysis, interpretation, and classification based on tolerance ellipsoids. We propose to use a multivariate tolerance region using special ellipsoids covering a set of points. These ellipsoids have two special features: 1) their significance level depends only on the number of sample values, and 2) only one point lies on their surface. We describe the algorithms of construction of the ellipsoids in high-dimension space. We demonstrate the significance level of these ellipsoids, the computational complexity of the algorithm, and its practical usefulness. Further, we investigate the statistical properties of the proposed method and introduce a novel function of statistical depth of multivariate random values, a novel method of multivariate ordering and statistical peeling, and a method of treating the uncertainty of the classification of points using such ellipsoids. The fact that only one point lies on the surface of such an ellipsoid allows for constructing concentric ellipsoids and estimating quantile levels, detecting outliers (anomalies), and unambiguous ordering of the points with respect to their statistical depth. In addition, this method allows raising the accuracy of classification and interpretation of points in high-dimensional space. A deeper point in the set has a higher rank. Comparing the rank of the point in every set, we can definitely interpret the point. The proposed method is useful for analyzing and interpreting multivariate data and classifying multivariate random samples. It may be applied in various domains, such as medical diagnostics, econometrical analysis, image analysis, etc.