首页|Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

扫码查看
One practical requirement of the music copyright management is the estimation of music relative loudness, which is mostly ignored in existing music detection works. To solve this problem, we study the joint task of music detection and music relative loudness estimation. To be specific, we observe that the joint task has two characteristics, i.e., temporality and hierarchy, which could facilitate to obtain the solution. For example, a tiny fragment of audio is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">temporally</italic> related to its neighbor fragments because they may all belong to the same event, and the event classes of the fragment in the two tasks have a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hierarchical</italic> relationship. Based on the above observation, we reformulate the joint task as hierarchical event detection and localization problem. To solve this problem, we further propose Hierarchical Regulated Iterative Networks (HRIN), which includes two variants, termed as HRIN-r and HRIN-cr, which are based on recurrent and convolutional recurrent modules. To enjoy the joint task's characteristics, our models employ an iterative framework to achieve encouraging capability in temporal modeling while designing three hierarchical violation penalties to regulate hierarchy. Extensive experiments on the currently largest dataset (i.e., OpenBMAT) show that the promising performance of our HRIN in the segment-level and event-level evaluations.

Task analysisEvent detectionEstimationSpeech processingNeural networksMusic

Bijue Jia、Jiancheng Lv、Xi Peng、Yao Chen、Shenglan Yang

展开 >

College of Computer Science, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China

2021

IEEE/ACM transactions on audio, speech, and language processing

IEEE/ACM transactions on audio, speech, and language processing

ISSN:2329-9290
年,卷(期):2021.29(1)
  • 1
  • 52