Polyphonic Optical Music Recognition with Enhanced Prior Information on Note Position and Direction
Polyphonic optical music recognition of notation is exceptionally challenging due to the proximity of note-heads and complex dependencies between symbols in polyphonic music.Traditional convolution methods struggle to represent the vertical position information of notes accurately due to the inherent shift-invariance of classical convolu-tion.Moreover,conventional methods for context sequence modeling face difficulties in effectively representing the spatial correlation between accidentals and noteheads within the staff.This results in the inaccurate recognition of note pitch and a limited scope of accidental effects.As a result,the annotation accuracy of the pitch and length of notes is compromised.A method for enhancing the prior information on the note position and direction in polyphonic optical music recongnition is proposed to address these issues.First,a vertical position encoding method is proposed to em-bed vertical positional information into music score images,enabling precise differentiation of pitches in polyphonic music.Second,a coordinate attention mechanism is introduced for accidentals to establish the spatial dependency between accidentals and noteheads.Finally,to address the vertical distribution of polyphonic noteheads,the hori-zontal arrangement of note sequences,and the directional characteristics presented by noteheads,stems,and tails,a directional attention module is proposed to capture the directional distribution of note features better.Experimental evaluations conducted on a polyphonic dataset demonstrate that the proposed method achieves a symbol error rate of 1.14%for length recognition and 2.14%for pitch recognition.Compared with state-of-the-art convolutional recursive neural networks,the proposed approach reduces the symbol error rate by 0.67%for length recognition and 1.14%for pitch recognition.These findings highlight the superior performance of this method in polyphonic optical recognition.
optical music recognition(OMR)position encodingcoordinate attentiondirection attention