深度学习技术在音频自动标注中的应用研究

Research on the Application of Deep Learning Technology in Automatic Audio Tagging

王培刚¹

扫码查看

作者信息

1. 湖北交通职业技术学院湖北武汉 430202
折叠

摘要

自动音频标注的目的是从音频输入生成能够描述此音频的一段文字.目前,音频标注模型的效果欠佳,并且在改善音频标注效果的过程中很少有应用预加载模型.自动音频标注的目标为音频片段产生合适的描述语句,拥有处理音频模态和文本模态数据的能力.为此,对音频模态与文本模态的预加载模型进行研究,并提出基于音频模态的自动标注系统和基于文本模态的自动标注系统,解决传统标注方法中训练和测试阶段目标不一致的问题.

Abstract

The purpose of automatic audio tagging is to generate a paragraph of texts that can describe the audio from the audio input.Currently,the effectiveness of audio tagging models is not good,and there are few applica-tions of preloading models in improving the audio tagging effect.The goal of automatic audio tagging is to generate appropriate descriptive statements for audio segments,and to have the ability to process audio and text modal data.Therefore,research is conducted on the preloading models of audio and text modalities,and automatic tagging based on audio modality and text modality are proposed to solve the problem of inconsistent goals in the training and testing stages of traditional tagging methods.

关键词

音频标注/自动标注/深度学习/预加载模型

Key words

Audio tagging/Automatic tagging/Deep learning/Preloading model

引用本文复制引用

出版年

2024

科技资讯

北京国际科技服务中心北京合作创新国际科技服务中心

科技资讯

影响因子：0.51

ISSN：1672-3791

段落导航