强化音符位置及方向先验信息的多音光学乐谱识别

Polyphonic Optical Music Recognition with Enhanced Prior Information on Note Position and Direction

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：由于同一个时刻包含多个音符的多音乐谱其音符符头距离近、符号间依赖关系复杂,使得多音光学乐谱识别极具挑战.传统基于卷积和序列建模的方法,由于经典卷积存在移不变性难以精确表示音符的纵向位置信息,而传统针对上下文序列建模的方法难以有效表征调号中变音记号与五线谱内符头的空间相关性,存在符头音高识别不准、变音记号作用范围有限的问题,从而影响音符音高、时值标注的准确性.针对以上问题,提出了一种强化音符位置及方向先验信息的多音光学乐谱识别方法.首先,提出一种纵向位置编码方法,将纵向位置信息嵌入乐谱图像,以更精确地表示符头的纵向位置信息,从而能明确区分多音乐谱中的不同音高.其次,提出了变音记号位置注意力,以明确建立变音记号和符头的空间依赖关系.最后,针对多音符头纵向分布、音符序列横向排列、音符符头、符干和符尾呈现的局部方向性特点,提出了方向注意力模块,更好地捕捉音符特征分布的方向性.在多音乐谱数据集上开展实验,实验结果表明,该方法对时值识别的符号错误率为 1.14%,对音高识别的符号错误率为2.14%.与当前基准方法卷积递归神经网络相比,该方法时值识别的符号错误率降低了 0.67%,对音高识别的符号错误率降低了1.14%,对多音乐谱具有良好的识别效果.

外文摘要：Polyphonic optical music recognition of notation is exceptionally challenging due to the proximity of note-heads and complex dependencies between symbols in polyphonic music.Traditional convolution methods struggle to represent the vertical position information of notes accurately due to the inherent shift-invariance of classical convolu-tion.Moreover,conventional methods for context sequence modeling face difficulties in effectively representing the spatial correlation between accidentals and noteheads within the staff.This results in the inaccurate recognition of note pitch and a limited scope of accidental effects.As a result,the annotation accuracy of the pitch and length of notes is compromised.A method for enhancing the prior information on the note position and direction in polyphonic optical music recongnition is proposed to address these issues.First,a vertical position encoding method is proposed to em-bed vertical positional information into music score images,enabling precise differentiation of pitches in polyphonic music.Second,a coordinate attention mechanism is introduced for accidentals to establish the spatial dependency between accidentals and noteheads.Finally,to address the vertical distribution of polyphonic noteheads,the hori-zontal arrangement of note sequences,and the directional characteristics presented by noteheads,stems,and tails,a directional attention module is proposed to capture the directional distribution of note features better.Experimental evaluations conducted on a polyphonic dataset demonstrate that the proposed method achieves a symbol error rate of 1.14%for length recognition and 2.14%for pitch recognition.Compared with state-of-the-art convolutional recursive neural networks,the proposed approach reduces the symbol error rate by 0.67%for length recognition and 1.14%for pitch recognition.These findings highlight the superior performance of this method in polyphonic optical recognition.

外文关键词：

optical music recognition(OMR)position encodingcoordinate attentiondirection attention

作者：

关欣、刘津津、刘辉、李锵

展开 >

作者单位：

天津大学微电子学院,天津 300072

天津师范大学音乐与影视学院,天津 300382

关键词：

光学乐谱识别位置编码位置注意力方向注意力

出版年：

2025

DOI：

10.11784/tdxbz202402001

天津大学学报

天津大学

天津大学学报

北大核心

影响因子：0.793

ISSN：0493-2137

年,卷(期)：2025.58(1)