ABSTRACT

One of the problems faced by the software engineering community is scarcity of data for conducting empirical studies. However, the software repositories can be mined to collect and gather the data that can be used for providing empirical results by validating various techniques or methods. The empirical evidence gathered through analyzing the data collected from the software repositories is considered to be the most important support for software engineering community these days. These evidences can allow software researchers to establish well-formed and generalized theories. The data obtained from software repositories can be used to answer a number of questions. Is design A better than design B? Is process/method A better than process/method B? What is the probability of occurrence of a defect or change in a module? Is the effort estimation process accurate? What is the time taken to correct a bug? Is testing technique A better than testing technique B? Hence, the field of extracting data from software repositories is gaining importance in organizations across the globe and has a central and essential role in aiding and improving the software engineering research and development practice.