Vehicle Recognition Based on Spectral-Temporal Resolution Optimization and Residual Spatial Pyramid Network
Vehicle classification is a key technology in intelligent transportation systems and a vital research area in road traffic monitoring systems.Owing to the advantages of acoustic sensors,such as high efficiency,low cost,round-the-clock operations,and strong concealment,vehicle classification based on vehicle sound characteristics has been extensively researched.However,existing vehicle sound signals only contain a single vehicle,with limited discussion on classifying mixed two-vehicle sound signals.To address this research gap,a network model is developed to classify noise signals from single and double vehicles.To address the issue of suboptimal fixed resolution in sound spectral features,a spectral time-resolution optimization model is designed using the attention score and frame warpage matrix obtained from network training.The classification network is based on a Convolutional Recurrent Neural Network(CRNN)architecture,with the convolutional component(multiscale signal reconstruction module)utilizing an efficient spatial pyramid for double-branch fusion.Since Recurrent Neural Network(RNN)and other cyclic networks are unsuitable for parallelization and have low operation speeds,the causal Time Convolutional Neural Network(TCN)is converted to a non-causal cyclic TCN.The mean Average Precision(mAP)of the model on the self-made dataset reaches 0.98,significantly outperforming the CRNN network with a comparable parameter count.Its performance is comparable to MobileNetV3 but with 1.7 × 106 fewer parameters.Experimental results indicate that the designed model is effective for processing long-term sound signals and extracting deep features.