ABSTRACT

The evaluation of data quality is a crucial stage in data science. As a result of the author's experience in the area, this chapter presents a general methodology for assessing the quality of the data that can be applied to any data set. Concepts and techniques of statistical demography and mathematics are used to create matrices and metrics that allow for the detection of anomalies that affect the quality of the data. Data from the Global Entrepreneurship Monitor, which collects information on entrepreneurship worldwide, is used in the case of Colombia. The results provide instruments such as data mining, coherence metrics, rate estimation, time analysis and comparison techniques with other sources, which constitute a comprehensive analysis, taking into account the most relevant dimensions when addressing a complex data source. The application of the methodology for the case of entrepreneurship in Colombia generates an analysis of the quality of the consistent data for researchers and the general public that uses this source of information for decision making.