基于技能网络的通用自然语言生成模型

Generic Natural Language Generation Model Based on Skill Network

扫码查看

原文链接

NETL
NSTL
维普
万方数据

中文摘要：使用多任务训练的自然语言生成模型仅使用一个模型即可完成各种不同的自然语言生成任务.但这种所有任务共享所有参数的模型,无法清楚地知道模型的每一部分参数学到了什么技能.为了根据不同的任务选择激活不同的模型参数,该文提出了一种基于稀疏激活的通用自然语言生成模型(SkillNet-NLG).与传统的稠密模型在执行任务时激活所有的模型参数不同,SkillNet-NLG在执行任务时,首先依据任务预先定义一组完成任务所需要的技能,然后根据定义的技能选择性地激活与技能相关的模型参数.这种模型设计使其能够通过正确地选择与任务相关的技能来高效地学习新的任务.在中文自然语言生成任务上的实验结果表明,首先,在仅使用一个模型的情况下,SkillNet-NLG在常见的五个自然语言生成任务中的四个上面超过了当前最好方法;其次,SkillNet-NLG的表现优于另外两类多任务基线模型(稠密模型和混合专家模型),并取得了与针对特定任务单独训练的模型相当的性能;最后,当应用到新任务上时,SkillNet-NLG相较于所有基线方法取得了更好的结果,验证了该文所提出的方法对于学习新任务的有效性.

外文摘要：The natural language generation model trained with multi-task learning can complete various natural lan-guage generation tasks via one model.However,it is unclear what skills are learned in which part of the model pa-rameters due to all the model parameters being activated for all the tasks.To activate different parts of parameters according to the task,a generic natural language generation model with a sparsely activated approach(SkillNet-NLG)is proposed.A set of skills needed to accomplish the task are pre-defined before performing a task,and then the parameters relevant to these skills are activated while the other parameters are not.This approach can also learn new tasks efficiently by combining the task-related skills properly.The experimental results on natural language generation tasks demonstrate that the proposed model achieves comparable performance to task-specific models,and outperforms previous best performance methods on four of five tasks,including two multi-task learning baseline models(a dense model and a Mixture-of-Expert model).And when adapted to new tasks,it surpasses all baseline systems.

外文关键词：

natural language generationmulti-task modelsparsely activated modelskill network

作者：

廖俊伟、程帅

展开 >

作者单位：

电子科技大学计算机科学与工程学院,四川成都 611731

关键词：

自然语言生成多任务模型稀疏激活模型技能网络

基金：

国家自然科学基金

项目编号：

61976043

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(3)

参考文献量35