Multi-modal classifier-fusion for the classification of emotional states in WOZ scenarios

doi:10.1201/b12525-79

ABSTRACT

Learning from multiple sources is an important field of research in many applications. Amongst of the benefits of such an approach is that different sources can correct each other or that a failure of a channel can be easier compensated. The emotion of a subject can give helpful cues for a computer in a human machine dialog. The problem of emotion recognition is inherently multimodal. The most intuitive way of inferring a user state is to use facial expression and spoken utterances. However, bio-physiological readings can be helpful in this context. In this study, a novel information fusion architecture for the classification of human emotions in a computer interaction is proposed. We use information from the three modalities above mentioned. It turned out that the combination of different sources can be helpful for the classification. Also, a reject option for the classifiers is evaluated and yields promising results.