Object Detection Based on Semantic Multi-scale Convolution and Attention Mechanism
In object detection,the scale variation of objects is one of the most challenging problems,so it is particularly important to fuse more effective multi-scale features.This pa-per proposes a component for fusing multi-scale features at a fine-grained level,called SMS-FF(semantic multi-scale feature fusion).First,the multi-scale convolution kernel generates the multi-scale semantic information required by the object detection network,and then fully fuses it using a novel multi-scale feature fusion method.In addition,using the weighted at-tention across channels of SE(squeeze-and-excitation)to re-calibrate the multi-scale fea-tures,which effectively strengthens the multi-scale information of the network,thereby im-proving the feature representation ability of the network.Therefore,SMSFF can effectively improve the detection accuracy,and the model is more robust to instance objects of different scales.The mAP of the proposed method on the benchmark datasets COCO 2017 test and Pascal VOC for the YOLOX object detector is 48.6%and 87.6%,respectively.