Monocular height estimation method of remote sensing image based on Swin Transformer-CNN and its application in highway road construction sites
At present,under the good condition of image geometry and radiation quality,the technology of 3D scene recon-struction by intensive matching of multi-view aerospace image is relatively mature,which has achieved good results both in ac-curacy and efficiency.However,when multi-view aerospace images with good geometric conditions are difficult to obtain,the geometric processing methods of classical photogrammetry and computer vision may face great challenges.In this paper,we study this problem and propose a monocular height estimation method of remote sensing image based on Swin Transformer and convolutional neural network(CNN).Swin Transformer is a hierarchical transformer structure with shifted windows.It com-bines the ability of convolutional neural network to process large scale image and extract multi-scale features,as well as the global information interaction ability of transformer.In addition,our method reformulates the height estimation problem into a classification-regression problem to improve model performance.Specifically,for each input image,our model classifies the height range into several discrete bins adaptively,where continuous height value is estimated via a linear combination of predic-ted discrete bins and height distribution probability.In experiments,we qualitatively and quantitatively demonstrate that the proposed method outperforms the state-of-the-art approaches,and it can also be applied to highway road construction sites with good generalization.