Test scaling is the process of converting test performances into meaningful numbers. It begins with scoring individual items and combining them into a raw score such as the sum score or percent correct score. Raw scores convey adequate information for classroom teachers and researchers, particularly when they represent performance on a one-off test. They are less adequate for standardized testing programs that use multiple versions of a test and produce scores over a number of years. Standardized tests typically involve a raw score transformation (i.e. a scale score) that allows for scale maintenance and consistency in score interpretation over the lifetime of the testing program. Raw scores and scale scores obtain meaning by comparing them to a frame of reference such as the performance of other examinees or a content domain represented by the test (Tong & Kolen, 2010).