ABSTRACT

Researchers in many fields have become increasingly aware of the problem of errors in measurements. The investigations into the scientific bases of measurement errors began over one and a half centuries ago by physical scientists and engineers. In clinical and medical research, measurement errors arise due, in part, to differences in diagnoses reported by clinicians, or differences in the accuracy of the measuring devices used by medical laboratories, or differences in the background training of medical technologists. Similar facts are widely known among scientists, and particularly clinical chemists who have spent a great deal of effort attempting to decipher the conflicting results related to blood constituents. There are numerous examples in clinical medicine that illustrate these situations, and here we list some of them. In radiology, inconsistencies and inaccuracies have long been known to exist in the readings of chest radiographs. For example, Birkelo et al. (1947) found that in five readers who attempted to select radiographs suggesting pulmonary tuberculosis from a largely normal group, none succeeded in selecting more than 75% of the abnormal films. Moreover, when the same films were read again after three months, a reader was likely to change his mind once in every five of the positive cases. Fletcher and Oldham (1964) revealed that when they tested the ability of different observers to grade and classify cases of pneumoconiosis, a lack of consistency both between observers and in the same observer persisted. Another example of interobserver inconsistency was reported by Yerushalmy et al. (1950). They asked six experienced readers to state whether a good film was unchanged, better, or worse. All six readers agreed over only two-thirds of the pairs. Analysis showed that disagreement was not confined to films of poor technical quality, and that unilateral disease was so tricky to classify as bilateral. On reading the films, a reader was likely to disagree with his previous interpretation of the radiographic appearances once in about every five cases, and once in 14 cases he would change his own reading of the second film from better to worse or vice versa.