Due to the large differences of target size in remote sensing images and the difficulty in effectively capturing the effective features of targets at different scales,it is difficult to effectively identify targets at different scales.And,when dealing with high-resolution images,traditional Transformers may face the problem of insufficient computational resources.In addition,the combination of a single loss calculation method and the Hungarian algorithm can increase the fluctuation of cost loss and affect the convergence speed and accuracy of the algorithm.Therefore,a multi-scale remote sensing target detection algorithm,named as MSDAB-DETR,is proposed.Firstly,the algorithm creates a new multi-scale attention fusion module to leverage the differences between different resolution feature information to achieve multi-scale prediction of remote sensing images.Secondly,an efficient attention mechanism is adopted to improve the self-attention mechanism in the Transformer model,reducing the memory footprint of the original model.Finally,the SIoU loss function is used as the bounding box regression loss,combined with the Hungarian algorithm,to weaken the fluctuation of binary graph matching,accelerate the convergence speed,and further improve the regression ability of bounding boxes.Experimental results show that the detection accuracy of this method on the NWPU VHR-10 and DIOR datasets is as high as 95.3% and 71.5%,respectively.Among them,on the NWPU VHR-10 dataset,the average detection accuracy for small,medium,and large-scale targets is improved by 10.5%,1.8%,and 2.7%,respectively compared to the DAB-DETR model.At the same time,the memory footprint is reduced by about 9% .
remote sensing image detectionDAB-DETR modelmulti-scale attention fusionefficient attention TransformerSIoU loss