ABSTRACT

A popular statistic for testing research questions involving categorical data is the chi-square test statistic. The chi-square statistic was developed by Karl Pearson to test whether two categorical variables were independent of each other. A typical research question involving two categorical variables can be stated as, “Is drinking alcoholic beverages independent of smoking cigarettes?” A researcher would gather data on both variables in a “yes-no” format, then cross tabulate the data. The cross-tabulation of the data for this research question would look like the following: Do you drink alcoholic beverages? Yes No Do you smoke cigarettes? Yes

Individuals would be asked both questions and their separate responses recorded. The cross-tabulation of the data would permit an indication of the number of people who did smoke cigarettes and did drink alcoholic beverages, the number of people who did smoke cigarettes and did not drink alcoholic beverages, the number of people who did not smoke cigarettes and did drink alcoholic beverages, and the number of people who did not smoke cigarettes and did not drink alcohol. Consequently, four possible outcomes are represented by the cross-tabulation of the yes/no responses to the two questions. The chi-square statistic is computed by taking the sum of the observed frequency minus the expected frequency squared divided by the expected frequency in each of the four cells. The chi-square formula is expressed as:

Multiplying the respective row and column sums and dividing by the total number of individuals yields the expected frequencies in each of the four cells. The calculation of the difference between what is observed and what is expected by chance alone forms the basis for the test of independence between two categorical variables. The expected cell frequencies are based on the two categorical variables being independent. An example will help to illustrate how to calculate the expected frequencies and the chi-square statistic.