ABSTRACT

Context: Software applications exposing a high ability to be extended, changed or configured are usually referred to as Highly-Configurable Systems (HCSs). Testing techniques for HCSs aim at finding effective but manageable test suites that lead to the early detection of faults. Evaluating the effectiveness of these techniques in realistic environments is a must, but also a challenge due to the lack of HCSs with available code, configuration models and fault reports.

Aim: In this chapter, we present the Drupal dataset, a collection of real-world data collected from the popular open-source Drupal framework. This dataset allows the assessment of variability testing techniques with real data of an HCS.

Method: We collected extensive non-functional data from the Drupal Git repository and the Drupal website, including code changes in different versions of Drupal modules (e.g., 557 commits in Drupal v7.22) and number of tests and assertions in the modules (e.g., 352 and 24,152, respectively). The faults found in different versions of Drupal modules were also gathered from the Drupal bug tracking system (e.g., 3,392 faults in Drupal v7.23). Additionally, we provided the Drupal feature model as a representation of the framework configurability, with more than 2,000 millions of different Drupal configurations; one of the largest attributed feature models published so far. With 150 citations since its publication, the Drupal dataset has become a helpful tool to researchers and practitioners to conduct more realistic experiments and evaluations of HCSs.