首页|基于多模态语义增强的水下视频多标签分类网络

基于多模态语义增强的水下视频多标签分类网络

扫码查看
光线传播路径在水下环境受到水分子散射和吸收的影响,物体出现模糊、扭曲等现象,导致基于水下视觉的检测精度不高.为提高水下检测精度,利用水下视频包含的视觉模态和音频模态,提出一种基于多模态语义增强的水下视频多标签分类网络(MCNEMS),通过基于注意力增强的多模态互补编解码生成增强标签的特征表示,引导多标签语义关联.具体而言,构建基于模态语义增强模块,完成多模态之间公共-独立的编解码,用来增强模态之间的共享信息和独立信息,并利用多头注意力机制生成多模态互补特征矩阵,获得增强的水下视频内容表示.为挖掘多标签隐性关联性,设计了基于动态图卷积的图关联学习模块,用于自适应地学习标签语义嵌入.在提出的水下视频多标签分类数据集(UVMCD)上进行实验,仿真结果表明所提模型均具有较好的性能指标.
Multi-label classification network for underwater video using enhancing multimodal semantic
The propagation of light in underwater environment is affected by the scattering and absorption of water molecules,resulting in low accuracy in underwater visual detection.To improve the accuracy of underwater detection,this study proposes a multi-label classification network for underwater video using enhancing multimodal semantic,which utilizes underwater videos mo-dalities such as the visual and audio modalities.By enhancing feature representations of labels based on multi head attention(MHA)mechanism,the network achieves the multi-label semantic correlations and then improves the accuracy of classification.Specifically,an enhancing modal semantic module contributes to encoding and decoding the common-independent features,which riches the correlation information of multi-modalities.To achieve the enhancing representation,the multi-head attention is used to learning a multimodal complementary module based on dynamic graph convolution is used to adaptively learn label semantic.Num-bers of experiments show that the proposed method have state-of-the-art performances,in which proposed the underwater video multi-label classification dataset.

multi-label classificationmultimodalgraph convolutionunderwater video

卢振坤、王粟、李云

展开 >

广西民族大学电子信息学院,南宁 530006

广西财经学院大数据与人工智能学院,南宁 530003

多标签分类 多模态 图卷积 水下视频

2024

现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
年,卷(期):2024.30(14)