Feature Generation and Engineering for Software Analytics

doi:10.1201/9781315181080-13

ABSTRACT

This chapter aims to provide an introduction on feature generation and engineering for software analytics, and shows how domain-specific features are extracted and used for three software engineering use cases, that is, defect prediction, crash release prediction, and developer turnover prediction. It describes features used in defect prediction and presents features used in crash release prediction for apps. The chapter discusses features generated from a monthly report for developer turnover prediction. It presents three case studies to demonstrate how features can be generated from different software artifacts for different software engineering problems. In general, there are two types of features for file-level defect prediction: code features, which measure properties of the code, and process features, which are extracted from the software development process. Finally, Foyzur Rahman and remkumar Devanbu performed a large-scale empirical study to investigate why and how process features performed better than code features.