Enhanced feature extraction network for small pedestrian detection in forest remote sensing images
The key of mitigating property and casualty losses caused by forest disasters lies in accurately locating indi-viduals involved.However,in the task of unmanned aerial vehicle(UAV)remote sensing image detection,pedestrians'detection in such complex scenes,as small objects,are prone to missing detection and deviation due to information distortion and loss.To address the challenge of accurately detecting pedestrians in complex forest scenes,a new object detection network with enhanced feature extraction(EFEN)was proposed in this study,which incorpo-rated the receptive field enhancement module,and optimized the loss function and the data preprocessing.The recep-tive field block(RFB)was reconstructed,being called RFBA,which was embedded into the YOLOv4 backbone for the expansion of the network receptive field.RFBA eliminated the risk of information loss in dilated convolution,while retaining the high performance of the original module in processing multi-scale and contextual features.Howe-ver,the training of model was inevitably disturbed by uninformative pixels,which were implicated by the enlargement of receptive field.Thus,convolutional block attention module(CBAM)was integrated into the network,which en-hanced the processing ability of the network when dealing with information of various scales,shapes,and directions.CBAM quantified the information features convey by analyzing the correlation between features across multiple chan-nels and different spatial locations and assigns weights to filters out some features with less information and contribu-tion.The loss function was further optimized by combining the Normalized Wasserstein Distance(NWD)similarity measurement with CIOU,being called GKCLOSS(Gaussian Kullback-Leibler and Complete-IOU loss),which alle-viated small objects'sensitivity of positioning offset.Besides,a new segmentation training strategy was involved to deal with the problem of imbalanced samples of small targets in remote sensing images.The image was segmented and the recognition areas were adaptively screened,which contained or were around the target.Extensive experiments were conducted on the Chongli Winter Olympic Games pedestrian dataset,showcasing the remarkable performance of the EFEN framework in small object detection,with the mean average precision(mAP)on the above dataset up to 39.10%,and the mAP was improved by EFEN,compared to SSD,YOLOv5 and YOLOv7 algorithms,which de-monstrated the effectiveness of this method during small object detection task.
remote sensing imagepedestrian mini-targetenhanced feature extractionreceptive field block attention moduleGKCLOSS loss function