ABSTRACT

As seen in Chapter 5, software repositories can be mined to assess the data stored over a long period of time. Most of the previous chapters focused on techniques that can be applied on structured data. However, in addition to structured data, these repositories contain large amount of data present in unstructured form such as the natural language text in the form of bug reports, mailing list archives, requirements documents, source code comments, and a number of identifier names. Manually analyzing such large amount of data is very time consuming and practically impossible. Hence, text mining techniques are required to facilitate the automated assessment of these documents.