ABSTRACT

Our opinion is based on experimental results from studies on audio-visual speech perception recently carried on at our Institute. The relative contribution of the lips and jaw to the visual information provided by a talking head has been evaluated. When assessing the intelligibility of vowels in a consonantal context and of consonants in a vocalic context, presented in the auditory and in the visual mode of perception, there is not always a strong complementarity between the two modalities. Dynamically presented images lead to a visual decision taken maximally 30 ms before that taken with a static presentation of the last image. The advantage of dynamic over static display is hence no more than a sixth of that of vision over audition. From our perspective, the results obtained by investigators who tested moving lights on rigid bodies or non-rigid faces should be understood in the framework of the computational shape from motion paradigm.