The chapter aims to introduce the main concepts used for the analysis of multimodal or cross-modal cohesion in film and to illustrate how they can be used to analyze audiovisual translation (AVT). It discusses the theoretical framework underlined that research into multimodal cohesion is still in its infancy, and much conceptualization and theorizing remains to be done. It also discusses and illustrates how multimodal cohesion is maintained or (re)created in an accessible film clip with audio description (AD) for the blind and visually impaired and subtitling for deaf and hard of hearing audiences using C. I. Tseng's model. Multimodality examines how these individual modes function and how they can be combined into a unified whole. Multimodal cohesion constitutes a particular challenge in AVT and media accessibility. In AVT one mode is altered/translated and, as a result, the explicit or implicit interaction between the translated mode and the other modes may also be altered, sometimes unintentionally.