ABSTRACT

The psychological test in which I became interested is a set of procedures known as student ratings of instruction. Anyone who teaches at a university in the United States has been exposed to these, as have those who teach in many other countries. In 1989 and 1990 I received widely divergent ratings for teaching the same course in 2 successive years,

using the same syllabus, text, course requirements, meeting format, etc. Although the discrepancy could conceivably have been nothing other than sampling error, it was impressive in degree-my average ratings were separated by 8 deciles (about 2.5 standard deviations) on the university norms that accompanied the ratings reports I received. Searching for an explanation, I naturally suspected that the ratings were influenced by something other than the qualities of the instructor, which should have been almost as similar as possible over the two offerings of this particular course. Perhaps it was significant that I first approached this topic as an outsider who had no previous research involvement in the topic. My first research on the topic therefore had some of the character of the naive child seeing the Emperor’s New Clothes.