When audio and video meet, they amalgamate into an audiovisual whole which is more than the sum of its parts. This can be expressed with the model formula A + V = AV + x. The inquiry approaches the unknown x by tracing the congruencies of audio (A) and video (V) responsible for the development of audiovisual Gestalts (AV). As an result x can be understood as the distance between audio and video. The chapter introduces three main reference parameters (character, structure, semantics) to establish the breadth of this distance. The congruency of the reference parameters aids in determining an audiovisual fusion as dissonant or consonant.

An appeal to multiple sensory modalities calls for an intermedia point of reference, which is found in the formal and analytical tool rhythm. As the intermediary of audiovisual fusions, rhythm is perceived by multiple senses and its structures can result in temporal as well as spatial congruencies between auditory and visual objects. In this artistic research undertaking rhythm is used as a method to generate practice-based knowledge. Audiovisual live performances are performed to understand audio and video as rhythmic instruments whose relationship to each other reveals itself in their being played together.