ABSTRACT

Vectors and matrices arise naturally in the analysis of statistical data and we have seen this in Chapter 52. For example, suppose we take a random sample of n females and measure their heights x i (i = 1, 2, …, n), giving us the vector x = [x 1, x 2, …, x n ] T . We can treat x as simply a random vector, which can give rise to different values depending on the sample chosen, or as a vector of observed values (the data) taken on by the random vector from which we calculate a statistic like a sample mean (average). We use the former approach when we want to make inferences about the female population. In this case, we find that the value of a given x i will depend on how it varies across the population, and this is described by its statistical “distribution,” which is defined by a univariate function of one variable called a probability density function (pdf). For example, xi may follow the well-known normal distribution, which seems to apply to many naturally occurring measurements. This distribution has a probability density function totally characterized by two (unknown) parameters, the population mean μ and the population variance σ 2, where σ is the standard deviation; we write xi ∼ N1(μ, σ 2), where “∼” means “distributed as.” (Throughout this chapter σ will always refer to a standard deviation and not to a singular value.) When the sample is random, the choice of any female does not affect the choice of any other, so that technically we say that the x i are statistically independent and they all have the same distribution.