Urban audio scene recognition method integrating semantic information
In response to the problem of urban scenes being easily confused and difficult to distinguish in the field of audio scene recognition,this paper proposes a city audio scene recognition method that integrates semantic information.The algorithm first segments speech and environmental sounds through voice activity detection,then identifies the scene types for both speech and environmental sounds separately,and finally calculates the scene probabilities of both by weighted information entropy to obtain the audio scene type that integrates semantic information.This method effectively solves the problem of poor classification results for easily confused and low discrimination audio scenes in traditional environmental audio scene recognition methods.Experiments show that the proposed method has a significant improvement effect on the recognition of easily confused urban audio scenes such as basketball courts and supermarkets,and the recognition results also prove the importance of semantic information for city audio scene recognition.
audio scene recognitionsemantic informationCNNBiLSTMinformation entropyinformation fusion