Object detection is a crucial task in the field of computer vision,aiming to accurately identify and locate objects of interest in images or videos.An improved object detection algorithm was proposed by incorporating feature fusion,optimizing the inter-layer transmission method of the encoder,and designing a random jump retention method.These improvements addressed the limitations of general Transformer models in object detection tasks.Specifically,to counteract the issue of insufficient object information perception due to the computational constraints limiting Transformer vision models to a single layer of features,a convolutional attention mechanism was utilized to achieve effective multi-scale feature fusion,thereby enhancing the capability of object recognition and localization.By optimizing the transfer mode between encoder layers,each encoder layer effectively transmitted and learned more information,reducing information loss between layers.Additionally,to address the problem where predictions in the intermediate stages of the decoder outperformed those in the final stage,a random jump retention method was designed to improve the model's prediction accuracy and stability.Experimental results demonstrated that the improved method significantly enhanced performance in object detection tasks.On the COCO2017 dataset,the model's AP reached 42.3%,and the AP for small targets improved by 2.2%;on the PASCAL VOC2007 dataset,the model's AP improved by 1.4%,and the AP for small targets improved by 2.4%.