Estimating, Interpreting, and Maintaining the Meaning of Test Scores

doi:10.4324/9781315751672-6

ABSTRACT

This chapter discusses the interpretation of scores on credentialing tests, which derive primarily from the fundamental and critical process of standard setting. It also discusses maintaining the meaning of credentialing test scores over time, which depends primarily on the techniques of test equating. The chapter focuses on connections to the decisions related to test design will be made, as the meaning of scores for credentialing tests are inextricably linked to test design and the intended uses of the credentialing test results. Most credentialing tests are scored using item-response theory (IRT), which in some applications can result in different scoring weights across items, an approach sometimes referred to as "pattern scoring". Applications of the two-parameter logistic (2PL), three-parameter logistic (3PL), and/or generalized partial credit (GPC) models can include pattern scoring, although these models can also be used to scale and equate tests based on summed or weighted summed scores.