基于多模态语义增强的水下视频多标签分类网络

Multi-label classification network for underwater video using enhancing multimodal semantic

扫码查看

原文链接

维普
万方数据

中文摘要：光线传播路径在水下环境受到水分子散射和吸收的影响,物体出现模糊、扭曲等现象,导致基于水下视觉的检测精度不高.为提高水下检测精度,利用水下视频包含的视觉模态和音频模态,提出一种基于多模态语义增强的水下视频多标签分类网络(MCNEMS),通过基于注意力增强的多模态互补编解码生成增强标签的特征表示,引导多标签语义关联.具体而言,构建基于模态语义增强模块,完成多模态之间公共-独立的编解码,用来增强模态之间的共享信息和独立信息,并利用多头注意力机制生成多模态互补特征矩阵,获得增强的水下视频内容表示.为挖掘多标签隐性关联性,设计了基于动态图卷积的图关联学习模块,用于自适应地学习标签语义嵌入.在提出的水下视频多标签分类数据集(UVMCD)上进行实验,仿真结果表明所提模型均具有较好的性能指标.

外文摘要：The propagation of light in underwater environment is affected by the scattering and absorption of water molecules,resulting in low accuracy in underwater visual detection.To improve the accuracy of underwater detection,this study proposes a multi-label classification network for underwater video using enhancing multimodal semantic,which utilizes underwater videos mo-dalities such as the visual and audio modalities.By enhancing feature representations of labels based on multi head attention(MHA)mechanism,the network achieves the multi-label semantic correlations and then improves the accuracy of classification.Specifically,an enhancing modal semantic module contributes to encoding and decoding the common-independent features,which riches the correlation information of multi-modalities.To achieve the enhancing representation,the multi-head attention is used to learning a multimodal complementary module based on dynamic graph convolution is used to adaptively learn label semantic.Num-bers of experiments show that the proposed method have state-of-the-art performances,in which proposed the underwater video multi-label classification dataset.

外文关键词：

multi-label classificationmultimodalgraph convolutionunderwater video

作者：

卢振坤、王粟、李云

展开 >

作者单位：

广西民族大学电子信息学院,南宁 530006

广西财经学院大数据与人工智能学院,南宁 530003

关键词：

多标签分类多模态图卷积水下视频

出版年：

2024

DOI：

10.3969/j.issn.1007-1423.2024.14.001

现代计算机

中大控股

现代计算机

影响因子：0.292

ISSN：1007-1423

年,卷(期)：2024.30(14)