With the continuous advancement of artificial intelligence and deep learning technologies,research in the field of mul-timodal information processing and fusion has garnered widespread attention from researchers.This paper provides a comprehen-sive overview of the development history and milestone works of multimodal information processing,along with strategies and models for multimodal fusion.Based on different modalities,mainstream datasets for multimodal information processing and fu-sion are systematically classified and summarized.Using modality type as the classification criterion,this paper systematically re-views the research progress in multimodal information processing and fusion,emphasizing the distinctions between different mo-dalities.Multimodal information processing and fusion are categorized into four types:audio-visual processing and fusion,audio-text processing and fusion,visual-text processing and fusion,and visual-audio-text processing and fusion.Detailed in-vestigations are conducted on methods and models for processing and fusing different input modalities.Finally,a summary and outlook on the development of multimodal processing and fusion are provided.
关键词
多模态处理/多模态信息处理/多模态融合/深度学习
Key words
multimodal processing/multimodal information processing/multimodal fusion/deep learning