首页|基于Mamba-UNet架构的音高估计模型

基于Mamba-UNet架构的音高估计模型

扫码查看
单声源声音的音高估计算法主要有音高跟踪的鲁棒算法(Robust Algorithm for Pitch Tracking,RAPT)、SWIPE(Sawtooth Waveform Inspired Pitch Estimator)、Harvest等,但在引入有音乐伴奏等复调音乐的声源时,这些算法在人声音高估计任务中存在明显不足.借鉴现有的研究成果,改进传统声调估计的鲁棒模型(Robust Model for Vocal Pitch Estimation,RMVPE),提出一种基于Mamba-UNet架构的Mamba-RMVPE,用于解决复调音乐等多声源声音的人声音高估计问题.相较于传统的RMVPE,Mamba-RMVPE的音高准确率(Raw Pitch Accuracy,RPA)、音色准确率(Raw Chroma Accuracy,RCA)、总体正确率(Overall Accuracy,OA)均有提升,推理时间也大幅缩短.
Pitch Estimation Model Based on Mamba-UNet Architecture
The pitch estimation algorithms for single source sound mainly include Robust Algorithm for Pitch Tracking (RAPT),Sawtooth Waveform Inspired Pitch estimator (SWIPE),Harvest,etc. However,when introducing polyphonic music sources with musical accompaniment,these algorithms have significant shortcomings in human voice high estimation tasks. Drawing on existing research results and improving traditional Robust Model for Vocal Pitch Estimation (RMVPE),a Mamba-RMVPE based on Mamba-UNet architecture is proposed to solve the problem of high estimation of human voice from multiple sound sources such as polyphonic music. Compared to traditional RMVPE,Mamba-RMVPE has improved Raw Pitch Accuracy (RPA),Raw Chroma Accuracy (RCA),and Overall Accuracy (OA),and significantly reduced inference time.

polyphonypitch estimationRobust Model for Vocal Pitch Estimation (RMVPE)Mamba-UNet

彭祖剑

展开 >

开普云信息科技股份有限公司,广东 东莞 523000

复调音乐 音高估计 声调估计的鲁棒模型(RMVPE) Mamba-UNet

2024

电声技术
电视电声研究所(中国电子科技集团公司第三研究所)

电声技术

影响因子:0.259
ISSN:1002-8684
年,卷(期):2024.48(9)