首页|基于预训练语言模型的歌词生成方法

基于预训练语言模型的歌词生成方法

扫码查看
根据旋律生成合适的歌词要求模型能够发现并学习歌词与旋律之间的对应关系,以及歌词内部的发音规律、语义关系和逻辑结构,一直是人工智能和音乐领域的挑战性研究课题。不幸的是,具有旋律-歌词对齐的配对数据集非常有限,阻碍了歌词自动生成方法的进一步研究,特别是相关的以中文歌曲为核心的歌词生成研究。为了解决这个问题,利用多层注意力网络(Transformer)来学习歌词与旋律的对应关系,并利用预训练语言模型来缓解歌词数据稀缺的问题。首先,将歌词生成问题建模为一个条件文本生成任务。模型首先将给定音乐旋律的音高和时值进行整合和编码,然后将其输入到语言模型中。最后,通过将旋律与歌词按对应的形式对齐后,对语言模型的参数进行微调,从而达到对歌词数据进行高效学习的目的。实验结果表明,我们提出的从旋律到歌词生成模型在语言流畅性、语义完整性、押韵程度、旋律-情感契合度和标题歌词语义一致性等5个指标上,较基线模型取得了显著提升。
Lyrics generation via pre-trained language model
Generating appropriate lyrics according to the melody requires the ability to learn and discover the correspondence be-tween lyrics and melody,the pronunciation rules,semantic relationships and logical structure within the lyrics.It has always been a challenging research topic in the fields of artificial intelligence and music creation.Unfortunately,the paired datasets with mel-ody-lyric alignment are very limited,hindering further research on automatic lyrics generation methods,especially related re-search on lyrics generation centered on Chinese songs.In order to solve this problem,this paper used a multi-layer attention net-work(i.e.,Transformer)to learn the correspondence between lyrics and melody,and used a pre-trained language model to al-leviate the problem of scarcity of lyrics data.First,the lyrics generation problem was modeled as a conditional text generation task.The model first integrated and encoded the pitch and duration of a given musical melody,which was then fed into a lan-guage model.Finally,after aligning the melody and lyrics in the form of sentence pairs,the parameters of the language model were fine-tuned,so as to achieve the purpose of efficient learning of the lyrics data.Experimental results showed that our pro-posed melody-to-lyric generation model achieved significant improvement over the baseline model in terms of five evaluation metrics,including language fluency,semantic integrity,rhyme degree,melody-emotion matching,and title-lyric semantic consis-tency.

lyric generationdeep learningpre-trained language model

范菁、张珣、刘祥根

展开 >

四川大学文学与新闻学院,四川成都 610207

四川大学计算机学院,四川成都 610065

歌词生成 深度学习 预训练语言模型

国家自然科学基金

62206192

2024

西南民族大学学报(自然科学版)
西南民族大学

西南民族大学学报(自然科学版)

CSTPCD
影响因子:0.441
ISSN:2095-4271
年,卷(期):2024.50(3)