Human Subsumption in Training Datasets for Music Generation

doi:10.1201/9781003480167-3

ABSTRACT

This chapter examines the subsumption of human creativity in machine learning models for music generation, with a focus on the ethical and epistemological implications of training data accumulation. I introduce a new categorisation to describe how data gets accumulated into training datasets: scraped data and laboured data. The former is often collected without consent from data available online, and the latter is generated explicitly for AI training through various forms of unwitting or coerced labour. Drawing on Marxian and post-Marxian frameworks, I show how contemporary AI music systems operate through processes analogous to primitive accumulation, dispossessing musicians and online communities of their creative labour. I also present the ongoing sanitisation process, wherein ethically dubious data collection practices are laundered through different layers of technical and institutional abstractions, obscuring their exploitative origins.