ABSTRACT

This chapter lays out some useful preliminary steps in OODA. The common theme is looking at data. This can be quite challenging in high dimensions, where the common linear regression device of studying the distribution of each predictor would require simultaneous visualization of perhaps tens of thousands of univariate distributions. Effective dealing with that issue is illustrated with the Drug Discovery data set using the idea of Marginal Distribution plots, which considers various types of representative variables. Another important issue is simple linear scaling of variables, where various sensible (and possibly quite divergent) analytic choices are highlighted. Nonlinear scaling of variables is also considered, with a recommended automatic shifted log transformation for data sets with wildly varying amounts of skewness. Registration and alignment issues are also overviewed.