DCaT:Lightweight Semantic Segmentation Model for High-Resolution Scenes
Semantic segmentation is a critical task in computer vision for analyzing and understanding scenes.However,existing segmentation models require high computational costs and memory demands,which makes them unsuitable for lightweight semantic segmentation in high-resolution scenes.To address this issue,a novel lightweight semantic segmenta-tion model called DCaT has been proposed,specifically designed for high-resolution scenes.First,the model extracts the local low-level semantics of the image using deep separable convolution;second,the global high-level semantics of the image is obtained using a lightweight Transformer based on coordinate-aware and dynamic sparse mixed attention;then,the high-level semantics are injected into low-level semantics through the fusion module;and lastly,pixel prediction la-bels are outputted through the segmentation head.The experimental results of DCaT on the high-resolution dataset Cityscapes show that compared to the benchmark model,the mean intersection over union has improved by 1.5 percent-age points,the model's complexity has been reduced by 26%,and the inference speed has increased by 12%.A better bal-ance between model complexity and performance in high-resolution scenarios is achieved,thus demonstrating the effec-tiveness and practicality of DCaT.