首页|基于多模态融合提升的文本分类方法

基于多模态融合提升的文本分类方法

扫码查看
尽管基于多模态的文本分类技术在应用到具体场景中具有潜力,但仍存在局限性.现有多模态融合模型要求输入数据模态对齐,因此大量不完整的多模态数据被直接浪费,从而限制了推理时可用数据的规模和灵活性.为了解决这个问题,提出了一种基于多模态融合提升的文本分类模型和不充分多模态资源训练方法.与传统方法相比,提出的模型在标准数据集上的性能平均提高了约4.25%.此外,在除文本输入模态外的其他模态缺失率为50%的情况下,不充分多模态资源训练方法的性能比传统多路由策略提高了约4%.这表明所提出的模型和训练方法具有明显的优势和有效性.
A text classification method based on multimodal fusion enhancement
Although multimodal text classification techniques have potential when applied to specific scenarios,there are still some limitations.Existing multimodal fusion models require modal alignment in the input data,resulting in a large amount of incomplete multimodal data being directly discarded,thus limiting the scale and flexibility of available data for inference.To address this problem,we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods,our model had shown an improved performance of an average of 4.25%on a standard dataset.Furthermore,when the missing rate of other modalities except for text input was 50%,using the insufficient multimodal resource training method improved the performance by about 4%compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.

text classificationcross attentionmultimodal fusioninsufficient multimodal resource training method

刘德志、何柳、刘幼峰、韩德纯

展开 >

北京航空航天大学计算机学院,北京 100191

北京航空航天大学大数据与脑机智能高精尖创新中心,北京 100191

中国航空综合技术研究所,北京 100028

文本分类 交叉注意力 多模态融合 不充分多模态资源训练方法

2024

大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
年,卷(期):2024.10(2)
  • 27