ABSTRACT

In order to increase the effectiveness of multimedia document classification, it is crucial to combine multiple modalities, specifically text and image. Typically, either text content or image content forms the basis for features that are used in document classification. Therefore, researchers are trying to incorporate text and image through multimodal learning and fusion methods. However, there are many challenges involved in this process and thus multimedia document classification has become a research problem of great interest in many domains like the medical field and social media. This chapter provides an extensive survey of recent research efforts on multimedia document classification based on text–image analysis. In particular, the survey focuses on classification background, multimodal learning strategies, multimodal fusion techniques, and multimodal classification applications and challenges. Finally, a conclusion is drawn and some future research directions are recommended.