ABSTRACT

The literature about searching data bases propagates two criteria of their quality: “precision” or the proportion of relevant items among all retrieved ones, and “recall” or the proportion of relevant items among all relevant items available in a data base. Not only does the latter constitute the paradox of being measurable only after all relevant items have been retrieved, which makes the measure superfluous; it does not allow comparisons, essential for evaluating the reliability of retrieved data.

This chapter develops three α-agreement measures. They are grounded in the distinction between what search engines do and how analysts judge their results. With several comparable searches available, it turns out that the observed disagreement of the whole process is decomposable into the disagreement due to search engines retrieving different items and the disagreement due to the analysts’ difficulty of defining search terms and the unreliability of identifying what is relevant. Relating these observed disagreements to their corresponding expected disagreements results in one α-agreement of the replicability of the whole process, one of the search engines’ replicability, and one of the replicability of the researchers handling the process.