ABSTRACT

Statistical classification techniques can be seen as the opposite of analysis of variances (ANOVA) tests. ANOVA tests have categorical independent variables which are hypothesized to influence the values of some numerical scale dependent variables. In statistical classification, however, we produce a predictive model of categorical group membership in terms of some numerical scale independent variables. Therefore, the grouping variable is the dependent variable. Two statistical classification techniques are described using prediction of match outcomes in the 2010 FIFA World Cup as an example. Discriminant function analysis is used to predict membership of two or more groups. This technique is used to produce a predictive model of the pool stage matches of the World Cup which could be wins, draws or losses for the higher ranked team. The second technique is binary logistic regression which predicts membership of some dichotomous grouping variable. The technique is used to predict the outcomes of the knockout stage matches of the World Cup which are classified as being won by the higher ranked team or as upsets. Even if a knockout match goes to extra time and a penalty shoot out, one of the two teams involved will still be eliminated from the tournament.