Automating audio description

doi:10.4324/9781003003052-30

ABSTRACT

Audio description (AD) has established itself as a media accessibility service but its reliance on the specialised skills of audio describers poses challenges to broadening the service in response to changing legislation and exponential growth of audiovisual content across different media and platforms.

At the same time, research on automating the description of images and video scenes has shown initial successes owing to advances in computer vision and machine learning. Although the machine’s ability to capture and coherently describe the nuances and sequencing characteristic of audiovisual narratives is currently limited, the developments in computer vision have raised the question of whether automated or semi-automated methods of describing audiovisual content can be used to produce AD without compromising quality.

This chapter analyses the state of the art and challenges of machine-generated image and video description and examines current approaches to advancing this field. It then reports on early practical initiatives and outlines future directions in this area. The focus is on complementarity and additionality, such as the use of automated methods to increase the availability of meaningful AD and the use of human knowledge about AD to advance such methods, as opposed to focussing on attempts to replace the human effort.