首页|融合交叉注意力机制的轻量化街景语义分割算法

融合交叉注意力机制的轻量化街景语义分割算法

扫码查看
在城市街道场景中进行语义分割对实现自动驾驶和智能交通系统等应用至关重要。然而,由于主流的语义分割模型通常参数较多且计算复杂,使得它们难以在移动端或嵌入式设备上高效部署以及对系统实现实时响应。为了解决这一问题,文中提出了一种融合交叉注意力机制的轻量化街景语义分割算法。首先,使用轻量级的MobileNetV3网络替代原模型中的特征提取主干网络,从而提升参数利用率并减少模型的总体参数量;其次,对空洞空间金字塔池化模块(ASPP)进行了改进,设计了一种轻量化的密集连接ASPP模块LD-ASPP,通过密集连接将多个卷积层串联,并使用深度可分离空洞卷积取代ASPP模块中的标准空洞卷积,从而降低计算量并提升训练效率。同时,加入了条带池化模块,以捕获更加丰富的上下文信息,从而增强模型在复杂场景中的表现能力;最后,融入了交叉注意力机制,有效捕捉像素与周围像素之间的关系以及通道间的依赖性,实现更加精准的语义分割。实验结果表明,该算法在Cityscapes城市街景数据集上的平均交并比(mIoU)达到了 74。02%,模型参数量仅为2。18Mb;与使用MobileNetV2作为主干网络的DeepLabV3+模型相比,mIoU提升了 3。11%,训练速度提升了17。92%,且模型参数量仅为原模型的42。6%。
A lightweight street scene segmentation algorithm incorporating criss-cross attention
Semantic segmentation in urban street scenes is crucial for enabling applications such as autonomous driving and intelligent transportation systems.However,mainstream semantic segmentation models often have large parameter sizes and high computational complexity,making them difficult to efficiently deploy on mobile and embedded devices.To address these issues,this paper proposes a lightweight DeepLabV3+optimization algorithm for urban scene segmentation,incorporating criss-cross attention.Firstly,the lightweight MobileNetV3 network is used to replace the original backbone for feature extraction,enhancing parameter efficiency and reducing the model's overall parameter size.Secondly,the ASPP(Atrous Spatial Pyramid Pooling)module is improved by designing a lightweight densely connected ASPP(LD-ASPP)module,where densely connected convolutional layers are sequentially connected.Depthwise separable atrous convolutions replace the standard atrous convolutions in the ASPP module,thereby reducing computational overhead and improving training efficiency.A stripe pooling module is also introduced to capture richer contextual information,enhancing the model's ability to represent complex image scenes.Lastly,Criss-Cross Attention is integrated to effectively capture relationships between pixels and their surrounding regions,as well as channel dependencies,enabling more accurate semantic segmentation.Experimental results demonstrate that the proposed algorithm achieves a mean Intersection over Union(mIoU)of 74.02%on the Cityscapes urban street dataset,with a parameter size of only 2.18MB.Compared to the DeepLabV3+model using MobileNetV2 as the backbone,the mIoU is improved by 3.11 percentage points,training speed increased by 17.92%and the model's parameter size being only 42.6%of the original.

image semantic segmentationDeepLabV3+lightweightattention mechanismstrip pooling

邵玉文、裴东

展开 >

西北师范大学 物理与电子工程学院,甘肃 兰州 730070

甘肃省智能信息技术与应用工程研究中心,甘肃兰州 730070

图像语义分割 DeepLabV3+ 轻量化 注意力机制 条带池化

2025

西北师范大学学报(自然科学版)
西北师范大学

西北师范大学学报(自然科学版)

影响因子:0.463
ISSN:1001-988X
年,卷(期):2025.61(1)