首页|基于多尺度池化与特征融合的轻量级语义分割算法

基于多尺度池化与特征融合的轻量级语义分割算法

扫码查看
语义分割是视觉理解系统的重要组成部分,能够识别图像中存在的内容以及位置.现有的语义分割算法面临着复杂度与分割精度之间难以平衡的挑战,很难灵活的运用到实际场景中.为了解决这一问题,从性能和网络的参数量方面综合考虑,提出了一种基于多尺度池化与特征融合的高效语义分割算法.该方法主要以Deeplabv3+为基础算法框架,使用改进的轻量级MobileNetV2 作为骨干网络来降低网络模型的复杂度;使用独特的空洞空间金字塔池化模块(Distinctive Atrous Spatial Pyramid Pooling Module,DASPP),利用多尺度池化操作和不同大小的空洞卷积操作,充分捕捉多尺度目标特征和丰富的全局上下文语义信息;在解码部分引入注意机制增强表征力,提出了多级联合特征融合网络(Multi-level Feature Fusion Network,MFFN),使得高级、低级特征实现了有效融合,从而进一步提高了分割精度.提出的模型相比于经典的语义分割方法,大大减少模型参数的数量,并且性能得到显著改善.在PASCAL VOC 2012 数据集上进行了实验,本文模型参数数量仅为 6.66 M,在测试集上达到73.72%的准确度.
Lightweight semantic segmentation algorithm based on multi-scale pooling and feature fusion
Semantic segmentation is an important component of visual understanding systems,which can recognize the content and location present in images.However,existing semantic segmentation algorithms face the challenge of balancing complexity and segmentation accuracy,and cannot be flexibly applied to practical scenarios.To address this issue,this paper proposes a efficient semantic segmentation algorithm based on multi-scale pooling and feature fusion,taking into account both performance and network parameters.The method uses Deeplabv3+as the main algorithm and an improved lightweight MobileNetV2 as the backbone network to reduce the complexity of the network model.Using a Distinctive Atrous Spatial Pyramid Pooling Module(DASPP),utilizing multi-scale pooling operations and atrous convolution operations of different sizes,fully capturing multi-scale target features and rich global contextual semantic information.In the decoding section,attention mechanism is introduced to enhance representation,and a Multi-level Feature Fusion Network(MFFN)is proposed to effectively fuse high-level and low-level features,further improving segmentation accuracy.The model proposed in this article greatly reduces the number of model parameters and significantly improves performance compared to classical semantic segmentation methods.Experiments are conducted on the PASCAL VOC 2012 dataset,and the number of model parameters in this paper is only 6.66 M,achieving an accuracy of 73.72%on the test set.

deeplearningneural networksimage segmentationattention mechanismMobileNetV2

唐雪瑾、杨卫华、于晋伟

展开 >

太原理工大学 数学学院,山西 太原 030024

深度学习 神经网络 图像分割 注意力机制 MobileNetV2

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(12)