Remote Sensing Image Detection Based on Perceptually Enhanced Swin Transformer
Owing to the rapid development of remote sensing technology,remote sensing image detection technology is being used extensively in agriculture,military,national defense security,and other fields.Compared with conventional images,remote sensing images are more difficult to detect;therefore,researchers have endeavored to detect remote sensing images efficiently and accurately.To address the high calculation complexity,large-scale range variation,and scale imbalance of remote sensing images,this study proposes a perceptually enhanced Swin Transformer network,which improves the detection of remote sensing images.Exploiting the hierarchical design and shift windows of the basic Swin Transformer,the network inserts spatial local perceptually blocks into each stage,thus enhancing local feature extraction while negligibly increasing the calculation amount.An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance;additionally,the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression.Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP)of 78.47%and a detection speed of 10.8 frame/s,thus demonstrating its superiority over classical object detection networks(i.e.,Faster R-CNN and Mask R-CNN)and existing excellent remote sensing image detection networks.Additionally,the network performs well on all types of objects at different scales.