ABSTRACT

Data plays a critical role in machine learning. Every machine learning model is trained and evaluated using data, quite often in the form of a static dataset. The characteristics of these datasets will fundamentally influence a model's behavior. Datasheets for datasets have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate greater reproducibility of machine learning results, and help researchers and practitioners select more appropriate datasets for their chosen tasks. This chapter describes the process by which we developed the datasheets questions and the workflow for dataset creators to use when answering these questions. It concludes with a summary of the impact of datasheets for datasets and a discussion of implementation challenges and avenues for future work.