Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：One practical requirement of the music copyright management is the estimation of music relative loudness, which is mostly ignored in existing music detection works. To solve this problem, we study the joint task of music detection and music relative loudness estimation. To be specific, we observe that the joint task has two characteristics, i.e., temporality and hierarchy, which could facilitate to obtain the solution. For example, a tiny fragment of audio is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">temporally</italic> related to its neighbor fragments because they may all belong to the same event, and the event classes of the fragment in the two tasks have a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hierarchical</italic> relationship. Based on the above observation, we reformulate the joint task as hierarchical event detection and localization problem. To solve this problem, we further propose Hierarchical Regulated Iterative Networks (HRIN), which includes two variants, termed as HRIN-r and HRIN-cr, which are based on recurrent and convolutional recurrent modules. To enjoy the joint task's characteristics, our models employ an iterative framework to achieve encouraging capability in temporal modeling while designing three hierarchical violation penalties to regulate hierarchy. Extensive experiments on the currently largest dataset (i.e., OpenBMAT) show that the promising performance of our HRIN in the segment-level and event-level evaluations.

外文关键词：

Task analysisEvent detectionEstimationSpeech processingNeural networksMusic

作者：

Bijue Jia、Jiancheng Lv、Xi Peng、Yao Chen、Shenglan Yang

展开 >

作者单位：

College of Computer Science, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China

出版年：

2021

DOI：

10.1109/TASLP.2020.3030484

IEEE/ACM transactions on audio, speech, and language processing

ISSN：2329-9290

年,卷(期)：2021.29(1)

被引量1
参考文献量52