Abstract
Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spa-tial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multi-overlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimen-sion as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential informa-tion and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the pro-posed SELD algorithm.Field experiments show that the pro-posed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy com-pared with the baseline method.
基金项目
国家自然科学基金(61877067)
Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002)
Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2021C003)
Foundation of Science and Technology on Near-Surface Detection Laboratory(6142414200511)
陕西省自然科学基金(2021JZ-19)