Reliability II: Advanced methods | 11 | Assessing Competence in Medici

ABSTRACT

Item Characteristic Curves (ICC) is one way to determine inter-rater reliability. The ICC improves over Pearson’s r and Spearman’s rho, taking into account the differences in ratings for individual units, along with the correlation between raters. The reliability of an observation depends on the universe about which the inferences are to be made. A given score may generalize to several different universes; it may vary in how reliably it allows inferences about these universes. G-theory explicitly requires the specification of the universe of conditions over which observations can be generalized. The goal is to design a measure that samples sufficient numbers of instances from different facets of a universe of observations to yield a dependable estimate of the universe score of the assessment. G-studies are designed to assess the dependability of a particular assessment technique while decision studies are designed to gather data for decisions about individuals.