Cinema’s language is often treated in terms of distinct planes of image and sound, and, indeed, sound design and music. Some authors have sought to elucidate the mechanisms and functions of how sound and image may influence each other, one notable example being Chion’s (1994) audiovisual contract. However, many contemporary soundtracks provide examples in which solely sonic factors (texture, gesture timbre) combine to delineate spatial attributes, while recent commentary relating to the haptic score (Mera 2016a) has highlighted imperatives around considering music and sound design in integrated contexts (Greene and Kulezic-Wilson 2016).This chapter treats cinematic narrative and perspective as productive crosstalk between and within sensory domains, activating associations that serve affective and integrating functions. In this context, embodied cognition and conceptual metaphor theories (Lakoff and Johnson 1999; Fauconnier and Turner 1998) may have much to tell us about the cinematic experience. Building upon previous discussions of cinematic sound and space, we propose a new framework - a spatiotemporal contract - integrating features from models of embodied cognition (Johnson 2008), Smalley’s (1997) theory of electroacoustic music and Grey’s (1977) timbre-space refined through embodied models (Roddy and Bridges 2018), which are then cross-referenced with Ward’s (2015) discussion of embodied cognition in sound design.