Multimodality I : Speech, prosody and gestures

doi:10.4324/9781003031758-5

ABSTRACT

In the age of the Internet, trillions of bytes of media data are generated every day through telecommunications and social media. This surge of born-digital media data, for example, instant voice/video messages, conference calls, podcasts, video blogs and so on, offers researchers unprecedented opportunities to deepen their understanding of how human beings communicate and go about their social activities. However, such a large amount of data also brings a new problem: how may we plough through so much media data and extract meaningful information efficiently?

This chapter explores opportunities and challenges at the interface between digital humanities and multimodality research which focuses on the use of prosody and gesture in spoken communication. Following an overview of key methods and frameworks in prosody and gestures research, it highlights selected projects which have showcased the ways in which today’s computer technology has revolutionised multimodality as an area of research. In recent years, many new computer tools have become available to aid media data acquisition, processing and analysis. These tools have (semi-)automatised many processes which were labour-intensive, expensive and tedious. Therefore, researchers can now afford to compile and process substantially larger multimodal datasets much faster and at a much lower cost. The chapter also introduces tools which open up new avenues for researchers to acquire new types of multimodal data (e.g. YouTube videos) and data streams (e.g. GPS, heartbeats). In the sample analysis, we demonstrate the typical workflow for using a range of these latest computer tools to generate a corpus of YouTube videos, automatically annotate prosodic patterns, align multiple data streams and perform a multimodal analysis on the use of the epistemic stance marker ‘I think’ in video blogs.