首页|一种基于ConvMixer骨干的显著性目标检测模型

一种基于ConvMixer骨干的显著性目标检测模型

扫码查看
显著性目标检测(Saliency Object Detection,SOD)算法多采用基于卷积神经网络(Convo-lutional Neural Network,CNN)的骨干网络提取特征,然而CNN无法获取图像的长范围特征依赖.视觉转换器(Vision Transformer,ViT)将图像划分为图块(patch),通过Transformer在patch之间传播全局上下文信息获得长范围特征依赖,但Transformer的自注意力层具有二次方的时间复杂性.因此,提出一种低复杂性的基于patch的SOD算法CM-PoolNet,对经典的显著性目标检测PoolNet模型的骨干网络进行改进,使用卷积模型ConvMixer替换VGG和RestNet,提出新的特征融合方法.基于U型结构,编码器对输入图像进行Patch Embedding,送入重复堆叠的由深度可分离卷积和膨胀卷积构成的ConvMixer特征提取器中.为解码器设计了基于patch的特征融合模块.设计了BCE、SSIM和IOU 3 种损失,引导模型在像素级、图块级、特征图级3 级层次中学习输入图像和真值图像之间的转换.在DUTS数据集和ECSSD数据集上进行实验,结果表明:提出的模型能够有效地分割突出的目标区域,并且准确预测具有清晰边界的精细结构.
A saliency object detection model based on ConvMixer backbone
Saliency object detection(SOD)algorithms mostly use a backbone network based on Convolutional Neural Network(CNN)to extract features.However,CNN cannot obtain long-range feature dependence of images.Vision Transformer(ViT)divides the image into patches and propagates the global context information between patches through the transformer to obtain long-range feature dependence,but the transformer's self attention layer has quadratic time complexity.Therefore,we propose a low-complexity patch-based SOD algorithm CM-PoolNet,which improves the backbone network of the classical PoolNet model for saliency target detection,replaces VGG and ResNet using the convolutional model ConvMixer and proposes a new feature fusion method.Specifically,based on the U-shaped structure,the encoder performs Patch Embedding on the input image and feeds it into the ConvMixer feature extractor consisting of deep detachable convolution and dilatation convolution,which is stacked repeatedly.A patch-based feature fusion module is designed for the decoder.Three kinds of losses,BCE,SSIM and IOU,are designed to guide the model to learn the conversion between the input image and the truth image at the pixel level,block level and feature level.Experiments on DUTS datasets and ECSSD datasets show that the proposed model can effectively segment prominent target areas and accurately predict fine structures with clear boundaries.

saliency object detectionpatch embeddingmixed loss functionPoolNetConvMixer

张斯博、朱敬华、奚赫然、杜欣月

展开 >

黑龙江大学 计算机科学技术学院,哈尔滨 150080

显著性目标检测 补丁嵌入 混合损失函数 PoolNet ConvMixer

国家自然科学基金

82374626

2024

黑龙江大学工程学报
黑龙江大学

黑龙江大学工程学报

影响因子:0.358
ISSN:2095-008X
年,卷(期):2024.15(1)
  • 31