An Adaptive Road Scene Semantic Segmentation Network with Refined Multi-Scale Perception and Optimized Contours
Semantic segmentation is usually described as a pixel-level classification task,while the Mask-Former model integrating Convolutional Neural Network and Transformer describes it as a mask-level clas-sification task.To solve the problems of poor deformation modeling ability,blurred object contour segmen-tation and slow convergence speed,an adaptive road scene semantic segmentation network with refined multi-scale perception and optimized contours is proposed.Firstly,the bottleneck structure formed by stan-dard convolution and deformable convolution stack is used in the encoder to improve the deformation mod-eling ability of the network.Then,the feature refinement module is adopted in the decoder to filter the ir-relevant features,which further improves the decoding ability of the feature pyramid network.To address the problem of pixel misalignment in the up-sampled features when the feature pyramid network is used for multi-level feature fusion,a feature calibration module is introduced to optimize the segmentation effect of ob-ject contours;Finally,the Miti-DETR decoder is employed in the Transformer module to speed up the training speed of the network and improve the segmentation accuracy.Experimental results show that the proposed network surpasses the existing semantic segmentation model on the Cityscapes and Mapillary Vistas datasets.