ABSTRACT

In 1936, Hartog and Rhodes (1936) documented the limitations of scoring essays. The overwhelming evidence was that in practical high-stakes settings, the reliability of expert ratings was unacceptably low. With the introduction of automated scoring procedures for multiple-choice items, the combination of lower reliability and relative inefficiency made constructed-response items all but obsolete.