ABSTRACT

Data missing in any study can still be valuable, although this issue does not usually receive adequate attention in research or methods resources. Missing values from key variables, and missing cases that cannot be traced in successive waves of a longitudinal study, can tell us some interesting stories. This chapter explores a large-scale longitudinal dataset named the China Family Panel Study (CFPS) to consider what the missing data looks like in a study of participation equity in higher education. Nearly half of the original selected sample size is missing after five waves. Instead of deleting values and conducting complete case analysis, or creating unwilling substitutes via imputation, all missing data is regarded as a separate category in the ensuing analyses in my study. The results show that students who are missing information on daily communicative languages at home, or languages used in schools, are much less likely to attend higher education. Students without data on the school and class types that they attend also show lower HE participation rates. Furthermore, students missing information on parents’ occupations or parents’ political status are slightly under-represented in HE. Missing cases seem to be closely associated with disadvantaged characteristics such as having a rural hukou and a lower SES family background. To sum up, missing data in CFPS is clearly not missing randomly, but more likely to be related to the substantive target of my study – the most disadvantaged students. Therefore, missing data should not be ignored, here or presumably in most other studies; otherwise, there could be a risk of biases in the results.