ABSTRACT

In Chapter 5, we covered probabilities and the outcomes of ‘experiments’. This chapter uses probability to discuss the distribution of values of numerical variables. Fallowfield et al. (2005: 97) recommended exploring data distributions before using inferential statistical procedures, warning against dismissing this activity as being too simplistic. The probability of a variable taking some valid value for a given case or element is one. That is, we are certain that each case will have a value for each variable of interest. Probability can be used to describe the likelihood of each potential value occurring. For example, the probability distribution for the number of sets in completed men’s singles matches at Wimbledon might be 0.545 for three sets, 0.289 for four sets and 0.165 for five sets. This is based on the 121 completed matches in the 2011 championship where there were 66 three-set matches, 35 four-set matches and 20 five-sets matches. The number of sets in this example is a random variable because it is a variable that gives a numerical outcome (3, 4 or 5) to an ‘experiment’ (Anderson et al., 1994: 158). The numerical values of the random variable together with the probabilities of each value occurring can be used to calculate the expected value and variance for such variables. There are discrete random variables that use a finite set of values, such as the number of sets in a men’s singles tennis match at Wimbledon which has three possible values. There are also continuous random variables that could take any one of an infinite number of possible values. This chapter covers the different types of discrete and continuous probability distributions. Those distributions that are used in statistical testing are of particular interest and a pre-requisite to reading Chapter 7 of this book.