In recent years,the increasing technology demand for Unmanned Aerial Vehicles(UAVs)in dual-use operations during day and night has posed significant challenges to the accuracy of ground target detection.To address the low accuracy of UAV target detection algorithms in complex backgrounds,an improved dual-spectrum image fusion network based on YOLOv8 is proposed,called the Progressive Cross-Modal Fusion Network(PCMFNet).Firstly,a feature-level fusion strategy is employed,allowing both visible light and infrared data to be fully leveraged,significantly enhancing target detection performance in complex scenarios.Secondly,a lightweight feature extraction branch is designed to mitigate the potential introduction of redundant information during multi-modal fusion.Finally,a Progressive Cross-modal Fusion Module is proposed,which strengthens the feature interaction of multi-modal information and effectively addresses the decline in detection performance caused by differences between modalities.Experimental results indicate that higher precision and recall are achieved by PCMFNet compared to the original algorithm.Specifically,on the LLVIP dataset,AP50 and mAP0.5-0.95 values of 91.2% and 55.5% are reached,respectively,with a frame-per-second(FPS)rate of 95.In conclusion,PCMFNet demonstrates superior performance compared to existing algorithms,and future work will explore its potential application in various environments.