大数据2024,Vol.10Issue(2) :80-93.DOI:10.11959/j.issn.2096-0271.2023067

基于多模态融合提升的文本分类方法

A text classification method based on multimodal fusion enhancement

刘德志 何柳 刘幼峰 韩德纯
大数据2024,Vol.10Issue(2) :80-93.DOI:10.11959/j.issn.2096-0271.2023067

基于多模态融合提升的文本分类方法

A text classification method based on multimodal fusion enhancement

刘德志 1何柳 2刘幼峰 1韩德纯3
扫码查看

作者信息

  • 1. 北京航空航天大学计算机学院,北京 100191;北京航空航天大学大数据与脑机智能高精尖创新中心,北京 100191
  • 2. 中国航空综合技术研究所,北京 100028
  • 3. 北京航空航天大学大数据与脑机智能高精尖创新中心,北京 100191
  • 折叠

摘要

尽管基于多模态的文本分类技术在应用到具体场景中具有潜力,但仍存在局限性.现有多模态融合模型要求输入数据模态对齐,因此大量不完整的多模态数据被直接浪费,从而限制了推理时可用数据的规模和灵活性.为了解决这个问题,提出了一种基于多模态融合提升的文本分类模型和不充分多模态资源训练方法.与传统方法相比,提出的模型在标准数据集上的性能平均提高了约4.25%.此外,在除文本输入模态外的其他模态缺失率为50%的情况下,不充分多模态资源训练方法的性能比传统多路由策略提高了约4%.这表明所提出的模型和训练方法具有明显的优势和有效性.

Abstract

Although multimodal text classification techniques have potential when applied to specific scenarios,there are still some limitations.Existing multimodal fusion models require modal alignment in the input data,resulting in a large amount of incomplete multimodal data being directly discarded,thus limiting the scale and flexibility of available data for inference.To address this problem,we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods,our model had shown an improved performance of an average of 4.25%on a standard dataset.Furthermore,when the missing rate of other modalities except for text input was 50%,using the insufficient multimodal resource training method improved the performance by about 4%compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.

关键词

文本分类/交叉注意力/多模态融合/不充分多模态资源训练方法

Key words

text classification/cross attention/multimodal fusion/insufficient multimodal resource training method

引用本文复制引用

出版年

2024
大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
参考文献量27
段落导航相关论文