Multivariate Statistical Analysis

doi:10.1201/9781420010572-53

ABSTRACT

Vectors and matrices arise naturally in the analysis of statistical data and we have seen this in Chapter 52. For example, suppose we take a random sample of n females and measure their heights x _i (i = 1, 2, …, n), giving us the vector x = [x ₁, x ₂, …, x _n ] ^T . We can treat x as simply a random vector, which can give rise to different values depending on the sample chosen, or as a vector of observed values (the data) taken on by the random vector from which we calculate a statistic like a sample mean (average). We use the former approach when we want to make inferences about the female population. In this case, we find that the value of a given x _i will depend on how it varies across the population, and this is described by its statistical “distribution,” which is defined by a univariate function of one variable called a probability density function (pdf). For example, x_i may follow the well-known normal distribution, which seems to apply to many naturally occurring measurements. This distribution has a probability density function totally characterized by two (unknown) parameters, the population mean μ and the population variance σ ², where σ is the standard deviation; we write x_i ∼ N₁(μ, σ ²), where “∼” means “distributed as.” (Throughout this chapter σ will always refer to a standard deviation and not to a singular value.) When the sample is random, the choice of any female does not affect the choice of any other, so that technically we say that the x _i are statistically independent and they all have the same distribution.