Generating appropriate lyrics according to the melody requires the ability to learn and discover the correspondence be-tween lyrics and melody,the pronunciation rules,semantic relationships and logical structure within the lyrics.It has always been a challenging research topic in the fields of artificial intelligence and music creation.Unfortunately,the paired datasets with mel-ody-lyric alignment are very limited,hindering further research on automatic lyrics generation methods,especially related re-search on lyrics generation centered on Chinese songs.In order to solve this problem,this paper used a multi-layer attention net-work(i.e.,Transformer)to learn the correspondence between lyrics and melody,and used a pre-trained language model to al-leviate the problem of scarcity of lyrics data.First,the lyrics generation problem was modeled as a conditional text generation task.The model first integrated and encoded the pitch and duration of a given musical melody,which was then fed into a lan-guage model.Finally,after aligning the melody and lyrics in the form of sentence pairs,the parameters of the language model were fine-tuned,so as to achieve the purpose of efficient learning of the lyrics data.Experimental results showed that our pro-posed melody-to-lyric generation model achieved significant improvement over the baseline model in terms of five evaluation metrics,including language fluency,semantic integrity,rhyme degree,melody-emotion matching,and title-lyric semantic consis-tency.
lyric generationdeep learningpre-trained language model