ABSTRACT

John VV. Tukey was one of the first statisticians to provide a detailed description of exploratory data analysis (EDA). He defined it as detective work-numerical detective work or counting detective work or graphical detective work. Resistant data analysis pertains to those methods where an arbitrary change in a data point or small subset of the data yields a small change in the result. Re-expression has to do with the transformation of the data to some other scale that might make the variance constant, might yield symmetric residuals, could linearize the data, or add some other effect. The ability to analyze free-form text documents is an important application in computational statistics. The bigram proximity matrix (BPM) is a non-symmetric matrix that captures the number of times word pairs occur in a section of text. The leukemia data set was first discussed in Golub et al., where the authors measured the gene expressions of human acute leukemia.