A lightweight street scene segmentation algorithm incorporating criss-cross attention
Semantic segmentation in urban street scenes is crucial for enabling applications such as autonomous driving and intelligent transportation systems.However,mainstream semantic segmentation models often have large parameter sizes and high computational complexity,making them difficult to efficiently deploy on mobile and embedded devices.To address these issues,this paper proposes a lightweight DeepLabV3+optimization algorithm for urban scene segmentation,incorporating criss-cross attention.Firstly,the lightweight MobileNetV3 network is used to replace the original backbone for feature extraction,enhancing parameter efficiency and reducing the model's overall parameter size.Secondly,the ASPP(Atrous Spatial Pyramid Pooling)module is improved by designing a lightweight densely connected ASPP(LD-ASPP)module,where densely connected convolutional layers are sequentially connected.Depthwise separable atrous convolutions replace the standard atrous convolutions in the ASPP module,thereby reducing computational overhead and improving training efficiency.A stripe pooling module is also introduced to capture richer contextual information,enhancing the model's ability to represent complex image scenes.Lastly,Criss-Cross Attention is integrated to effectively capture relationships between pixels and their surrounding regions,as well as channel dependencies,enabling more accurate semantic segmentation.Experimental results demonstrate that the proposed algorithm achieves a mean Intersection over Union(mIoU)of 74.02%on the Cityscapes urban street dataset,with a parameter size of only 2.18MB.Compared to the DeepLabV3+model using MobileNetV2 as the backbone,the mIoU is improved by 3.11 percentage points,training speed increased by 17.92%and the model's parameter size being only 42.6%of the original.