ABSTRACT

This chapter reviews aspects of analytic and statistical methods of high-dimensional data analysis. The challenges of interpreting observed data patterns in large databases is also briefly reviewed, noting that problems in the design of the database affect graphical and data-based results as well as statistical models. The chapter then discusses the effects of model misspecification in relation to nonlinearity. It examines approaches drawn from mathematics and computer science that apply to the health analytic setting. A more subtle issue in large databases follows from being in the large sample context and thus subject to laws of large numbers and central limit theorems. This also includes other large sample phenomena such as power law patterns in data and the Tracy–Widom distribution for the largest eigenvalue and the implication of these when they are present. The empirical data and computationally based sets of ideas viewed as analytics, or health analytics, are growing in application and complexity.