Position-sensitive Transformer aerial image object detection model
Addressing the challenge of detecting numerous small objects in UAV-captured aerial images,this paper introduces the Position-Sensitive Transformer Target Detection(PS-TOD)model.Initially,it presents a multi-scale feature fusion(MSFF)module incorporating a Positional Channel Embedded 3D At-tention(PCE3DA)mechanism.PCE3DA leverages the interplay between spatial and channel data to gen-erate 3D attention,enhancing feature representation in areas of interest.This foundation supports a bottom-up,cross-layer MSFF approach,augmenting the semantic richness of combined features.Subsequently,it proposes a novel Position-Sensitive Self-Attention(PSSA)mechanism,leading to the development of a position-sensitive Transformer encoder-decoder.This innovation heightens the model's sensitivity to target positioning,facilitating the capture of long-term dependencies within the image's global context.Compara-tive tests using the VisDrone dataset reveal that the PS-TOD model attains an Average Precision(AP)of 28.8%,marking a 4.1%enhancement over the baseline model(DETR).Furthermore,it demonstrates precise object detection in UAV aerial imagery against complex backdrops,significantly boosting the de-tection accuracy of small targets.