首页|融合语义信息的城市音频场景识别方法

融合语义信息的城市音频场景识别方法

扫码查看
针对音频场景识别领域中城市场景易混淆、难以区分的问题,文章提出了一种融合语义信息的城市音频场景识别方法。算法首先通过语音活动检测将语音与环境声音分割,然后分别对语音与环境声音进行场景类型识别,再将两者识别的场景概率通过信息熵加权计算,最终得到融合语义信息的音频场景类型。该方法有效解决了传统环境音频场景识别方法对于易混淆、低区分度音频场景分类结果较差的问题。实验表明,文章提出的方法对于篮球场、超市等易混淆城市音频场景的识别效果有较为明显的改进作用,同时识别结果也证明了语义信息对城市音频场景识别的重要性。
Urban audio scene recognition method integrating semantic information
In response to the problem of urban scenes being easily confused and difficult to distinguish in the field of audio scene recognition,this paper proposes a city audio scene recognition method that integrates semantic information.The algorithm first segments speech and environmental sounds through voice activity detection,then identifies the scene types for both speech and environmental sounds separately,and finally calculates the scene probabilities of both by weighted information entropy to obtain the audio scene type that integrates semantic information.This method effectively solves the problem of poor classification results for easily confused and low discrimination audio scenes in traditional environmental audio scene recognition methods.Experiments show that the proposed method has a significant improvement effect on the recognition of easily confused urban audio scenes such as basketball courts and supermarkets,and the recognition results also prove the importance of semantic information for city audio scene recognition.

audio scene recognitionsemantic informationCNNBiLSTMinformation entropyinformation fusion

农文韬、孙雨桐、梅宇

展开 >

南京师范大学 地理科学学院,江苏 南京 210023

南京师范大学 虚拟地理环境教育部重点实验室,江苏 南京 210023

音频场景识别 语义信息 CNN BiLSTM 信息熵 信息融合

江苏省研究生科研与实践创新计划项目

KYCX23_1716

2024

无线互联科技
江苏省科学技术情报研究所

无线互联科技

影响因子:0.263
ISSN:1672-6944
年,卷(期):2024.21(17)