基于模态类别的多模态信息处理与融合综述

Survey on Multimodal Information Processing and Fusion Based on Modal Categories

黄文栋 ¹王怡凡¹

扫码查看

作者信息

1. 中国石油大学(华东)计算机科学与技术学院,山东青岛 266580
折叠

摘要

随着人工智能和深度学习技术的不断发展,多模态信息处理与融合领域的相关研究受到了研究者们的广泛关注.本文总结多模态信息处理的发展历史和里程碑式的工作,以及多模态融合策略和模型.根据模态类别的不同,分类整理多模态信息处理与融合的主流数据集.以模态类型作为分类标准,本文系统地梳理多模态信息处理与融合的研究进展,强调不同模态之间的区别,并将多模态信息处理与融合分为:视听处理与融合、声文处理与融合、视觉-文本处理与融合和视觉-音频-文本处理与融合4种类别,对不同输入模态的处理融合方法与模型进行详细的研究.最后针对多模态处理与融合领域的发展进行总结与展望.

Abstract

With the continuous advancement of artificial intelligence and deep learning technologies,research in the field of mul-timodal information processing and fusion has garnered widespread attention from researchers.This paper provides a comprehen-sive overview of the development history and milestone works of multimodal information processing,along with strategies and models for multimodal fusion.Based on different modalities,mainstream datasets for multimodal information processing and fu-sion are systematically classified and summarized.Using modality type as the classification criterion,this paper systematically re-views the research progress in multimodal information processing and fusion,emphasizing the distinctions between different mo-dalities.Multimodal information processing and fusion are categorized into four types:audio-visual processing and fusion,audio-text processing and fusion,visual-text processing and fusion,and visual-audio-text processing and fusion.Detailed in-vestigations are conducted on methods and models for processing and fusing different input modalities.Finally,a summary and outlook on the development of multimodal processing and fusion are provided.

关键词

多模态处理/多模态信息处理/多模态融合/深度学习

Key words

multimodal processing/multimodal information processing/multimodal fusion/deep learning

引用本文复制引用

基金项目

山东省自然科学基金资助项目(ZR202211180156)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

参考文献量2

段落导航