首页|DCaT:面向高分辨率场景的轻量级语义分割模型

DCaT:面向高分辨率场景的轻量级语义分割模型

扫码查看
语义分割是计算机视觉中分析和理解场景的关键任务,但现有的分割模型需要较高的计算成本和内存需求,不适合高分辨率场景的轻量级语义分割。针对该问题,提出了一种新的面向高分辨率场景的轻量级语义分割模型DCaT。采用深度可分离卷积提取图像的局部语义;使用基于坐标感知和动态稀疏混合注意力的轻量级Trans-former 获取图像的全局语义;通过模块融合,在低级语义上注入高级语义;通过分割头输出像素预测标签。实验结果表明:与基线模型相比,DCaT在高分辨率数据集Cityscapes上的平均交并比提高了 1。5个百分点,模型复杂度降低了 26%,推理速度提升了 12%。实现了高分辨率场景下模型复杂度与性能之间的更好平衡,证明了 DCaT的有效性和实用性。
DCaT:Lightweight Semantic Segmentation Model for High-Resolution Scenes
Semantic segmentation is a critical task in computer vision for analyzing and understanding scenes.However,existing segmentation models require high computational costs and memory demands,which makes them unsuitable for lightweight semantic segmentation in high-resolution scenes.To address this issue,a novel lightweight semantic segmenta-tion model called DCaT has been proposed,specifically designed for high-resolution scenes.First,the model extracts the local low-level semantics of the image using deep separable convolution;second,the global high-level semantics of the image is obtained using a lightweight Transformer based on coordinate-aware and dynamic sparse mixed attention;then,the high-level semantics are injected into low-level semantics through the fusion module;and lastly,pixel prediction la-bels are outputted through the segmentation head.The experimental results of DCaT on the high-resolution dataset Cityscapes show that compared to the benchmark model,the mean intersection over union has improved by 1.5 percent-age points,the model's complexity has been reduced by 26%,and the inference speed has increased by 12%.A better bal-ance between model complexity and performance in high-resolution scenarios is achieved,thus demonstrating the effec-tiveness and practicality of DCaT.

semantic segmentationlightweighthigh resolutionTransformersparse attention

黄科迪、黄鹤鸣、李伟、樊永红

展开 >

青海师范大学计算机学院,西宁 810008

藏语智能信息处理及应用国家重点实验,西宁 810008

语义分割 轻量化 高分辨率 Transformer 稀疏注意力

2025

计算机工程与应用
华北计算技术研究所

计算机工程与应用

北大核心
影响因子:0.683
ISSN:1002-8331
年,卷(期):2025.61(1)