Road Image Free Space Detection via Learnable Deep Position Encoding
Objectives:The freespace detection is a crucial foundation for scene perception in advanced driver assistance systems.Convolutional neural network-based methods are unable to build global contextual in-fortmation that generate voids and interruptions in predicted results.At the same time,Transformer-based methods lack local understanding resulting in boundary misalignment and exceed.Methods:To this end,we propose a pyramid Transformer architecture with learnable deep position encoding for road freespace de-tection.First,the pyramid Transformer backbone is designed to extract road features from global perspec-tives.Second,local window attention is employed in dual-Transformer blocks to compensate for detail loss.Finally,to address the problem that traditional unlearnable position encoding ignores the spatial corre-lation between pixels and the real world,a learnable position encoding from deep convolutional features is constructed to solve the attention and semantic misalignment.Results:This model is tested and evaluated on KITTI road,Cityscapes,and Xiamen road datasets.The results show that our method achieves maxi-mum F measure of 97.53%and 98.54%in KITTI and Cityscapes,respectively.Conclusions:Our method outperforms existing algorithms in the KITTI road benchmark by ensuring higher efficiency while providing higher stability and accuracy.Meanwhile,our method provides high-precision semantic prior information for tasks such as path planning and trajectory prediction in automotive driving assistance systems.