YOLOX-S Based Acousto-optical Information Fusion Target Recognition Algorithm
In view of the limitations of single detection methods in modern battlefield and the shortcomings of single mode target recognition such as incomplete information and easy to be disturbed by noise,a new target recognition method combining two modes of sound and light was proposed.In this method,the log-mel spectral coefficient features of voiceprint information were extracted by deep convolutional residual network,the optical features of the target were extracted by YOLOX-S network,and the image space position and category informa-tion of the target were calculated.Then,a branch for processing sound features was introduced into the decou-pling head of the prediction part of the YOLOX-S model.The optical and acoustic characteristics of the target were spatially normalized on the classification branch of the YOLOX-S detection head,so that the visual data and voicing data could be mapped and fused in the same concatenable domain,and the acousto-optical fusion fea-tures of the target could be identified and reasoned.The experimental results showed that the fusion of voice-print information and image information could provide a more comprehensive perception capability and make the detection and recognition of objects more accurate and reliable.
target recognitionfeature fusionYOLOX-Svoiceprint features