All data are the result of human actions whether by experimentations, observations, or declarations. As such, the presumption of knowing what data are about is subject of imperfections that can affect the validity of research efforts. With calls for data-based research comes the need to assure the reliability of generated data. Especially the reliability of converting texts into analyzable data has become a burning issue in several areas. However, this issue has been met by only a few limited, and sometimes misleading measures of the extent to which data can be trusted as surrogates of the phenomena of analytical interests. The statistic proposed by the author – "Krippendorff’s Alpha" – is widely used in the social sciences, not only where human judgements are involved but also where measurements are compared.

The Reliability of Generating Data expands on the author’s seminal work in content analysis and develops methods for assessing the reliability of the kind of data that previously defied evaluations for this purpose. It opens with a discussion of the epistemology of reliable data, then presents the most basic alpha coefficient for the single-valued coding of predefined units. This largely familiar way of measuring reliability provides the platform for the succeeding chapters which start with an overview of alternative coefficients and then expand alpha one quality after another, including to cope with the reliabilities of multi-valued coding, segmenting texts into meaningful units, big data, and information retrievals. It also includes a chapter on how to diagnose and remedy imperfections and one on applicable standards, all converging on the statistical issues of the reliability of generating data.


  • Provides an overview of methods for assessing the reliability of generating data
  • Expands a statistic proposed by the author, already widely used in the social sciences
  • Includes many easy to follow numerical examples to illustrate the measures
  • Written to be useful to beginning and advanced researchers from many disciplines, notably linguistics, sociology, psychometric and educational research, and medical science.

How I became interested in reliability issues. 1. On the epistemology of reliable data. 2. Simplest kinds: The replicability of categorizing predefined units. 3. Some properties of the Alpha. 4. Alpha compared with primarily nominal agreement measures. 5. Metric differences between single-valued units.6. The quadrilogy for single-valued predefined units and big data. 7. Multi-valued coding of predefined units.8. Partitioning continua and coding relevant segments. 9. Preserving the coherency of identified segments in continua. 10. Distinctions drawn within continua. 11. Text mining and information retrieval. 12. Diagnostic devices and remedial actions. 13. Some special applications. 14. Statistical considerations. 15. Reliability standards. 16. Toward a general calculus of differences and agreements. Appendix. References