12 Pages

Module 22. Generalizability Theory

Suppose that you realize that your test has many different potential sources of error: rater error, item sampling error, and temporal instability. How do you figure out which sources of error are significant concerns and which might be trivially small? We have visited several different test theories throughout this book, some of which can help answer this question, though not directly. The most prevalent one has been classical test theory (CTT), which has provided a foundation for many of the scale development techniques that we have considered. With CTT, you could conduct separate studies to investigate test-retest, internal consistency, and inter-rater reliabilities, though it would be difficult to incorporate all of these different reliabilities into one meaningful statistic. In the last few modules, we have considered item response theory (IRT), which has led to more detailed insights about test behavior, though again that framework seems perhaps even less equipped to untangle different sources of error. In this final module, we consider one final test theory paradigm, generalizability theory (GT), and demonstrate some of the powerful insights that can come from using this paradigm. And yes, GT will be able to untangle all of our different sources of error.