Multi-label classification network for underwater video using enhancing multimodal semantic
The propagation of light in underwater environment is affected by the scattering and absorption of water molecules,resulting in low accuracy in underwater visual detection.To improve the accuracy of underwater detection,this study proposes a multi-label classification network for underwater video using enhancing multimodal semantic,which utilizes underwater videos mo-dalities such as the visual and audio modalities.By enhancing feature representations of labels based on multi head attention(MHA)mechanism,the network achieves the multi-label semantic correlations and then improves the accuracy of classification.Specifically,an enhancing modal semantic module contributes to encoding and decoding the common-independent features,which riches the correlation information of multi-modalities.To achieve the enhancing representation,the multi-head attention is used to learning a multimodal complementary module based on dynamic graph convolution is used to adaptively learn label semantic.Num-bers of experiments show that the proposed method have state-of-the-art performances,in which proposed the underwater video multi-label classification dataset.
multi-label classificationmultimodalgraph convolutionunderwater video