Traffic scene perception algorithm with joint semantic segmentation and depth estimation
Inspired by the idea that feature information between different pixel-level visual tasks can guide and optimize each other,a traffic scene perception algorithm based on multi-task learning theory was proposed for joint semantic segmentation and depth estimation.A bidirectional cross-task attention mechanism was proposed to achieve explicit modeling of global correlation between tasks,guiding the network to fully explore and utilize complementary pattern information between tasks.A multi-task Transformer was constructed to enhance the spatial global representation of specific task features,implicitly model the cross-task global context relationship,and promote the fusion of complementary pattern information between tasks.An encoder-decoder fusion upsampling module was designed to effectively fuse the spatial details contained in the encoder to generate fine-grained high-resolution specific task features.The experimental results on the Cityscapes dataset showed that the mean IoU of semantic segmentation of the proposed algorithm reached 79.2%,the root mean square error of depth estimation was 4.485,and the mean relative error of distance estimation for five typical traffic participants was 6.1%.Compared with the mainstream algorithms,the proposed algorithm can achieve better comprehensive performance with lower computational complexity.
perception of traffic environmentmulti-task learningsemantic segmentationdepth estimationTransformer