ABSTRACT

Two of the most central aspects in educational assessment are validity and reliability. Validity often refers to the meaningfulness of the interpretations of the scores and the inferences that are made based on them, whereas reliability often refers to how replicable the test results are from one testing occasion to the next and to what extent the true score is reflected in the observed test score. Score reliability is important. Assessing the reliability of tests composed of constructed responses is more complex than assessing the reliability of tests composed of multiple-choice items because constructed responses often require human scoring. There are two types of reliability to consider: assessment accuracy and scoring accuracy. In winsorizing, the lower and upper cut-points for an observed distribution are first identified. For each feature, the data points outside of the limits are then bounded to the lowest or highest points.