融合语义信息的城市音频场景识别方法

Urban audio scene recognition method integrating semantic information

农文韬 ¹孙雨桐 ¹梅宇¹

扫码查看

作者信息

1. 南京师范大学地理科学学院,江苏南京 210023;南京师范大学虚拟地理环境教育部重点实验室,江苏南京 210023
折叠

摘要

针对音频场景识别领域中城市场景易混淆、难以区分的问题,文章提出了一种融合语义信息的城市音频场景识别方法.算法首先通过语音活动检测将语音与环境声音分割,然后分别对语音与环境声音进行场景类型识别,再将两者识别的场景概率通过信息熵加权计算,最终得到融合语义信息的音频场景类型.该方法有效解决了传统环境音频场景识别方法对于易混淆、低区分度音频场景分类结果较差的问题.实验表明,文章提出的方法对于篮球场、超市等易混淆城市音频场景的识别效果有较为明显的改进作用,同时识别结果也证明了语义信息对城市音频场景识别的重要性.

Abstract

In response to the problem of urban scenes being easily confused and difficult to distinguish in the field of audio scene recognition,this paper proposes a city audio scene recognition method that integrates semantic information.The algorithm first segments speech and environmental sounds through voice activity detection,then identifies the scene types for both speech and environmental sounds separately,and finally calculates the scene probabilities of both by weighted information entropy to obtain the audio scene type that integrates semantic information.This method effectively solves the problem of poor classification results for easily confused and low discrimination audio scenes in traditional environmental audio scene recognition methods.Experiments show that the proposed method has a significant improvement effect on the recognition of easily confused urban audio scenes such as basketball courts and supermarkets,and the recognition results also prove the importance of semantic information for city audio scene recognition.

关键词

音频场景识别/语义信息/CNN/BiLSTM/信息熵/信息融合

Key words

audio scene recognition/semantic information/CNN/BiLSTM/information entropy/information fusion

引用本文复制引用

基金项目

江苏省研究生科研与实践创新计划项目(KYCX23_1716)

出版年

2024

无线互联科技

江苏省科学技术情报研究所

无线互联科技

影响因子：0.263

ISSN：1672-6944

段落导航