基于文本知识增强的问题生成模型

Question Generation Model Based on Text Knowledge Enhancement

扫码查看

原文链接

维普
万方数据

中文摘要：预训练语言模型在大规模训练数据和超大规模算力的基础上,能够从非结构化的文本数据中学到大量的知识.针对三元组包含信息有限的问题,提出利用预训练语言模型丰富知识的问题生成方法.首先,利用预训练语言模型中丰富的知识增强三元组信息,设计文本知识生成器,将三元组中的信息转化为子图描述,丰富三元组的语义;然后,使用问题类型预测器预测疑问词,准确定位答案所在的领域,从而生成语义正确的问题,更好地控制问题生成的效果;最后,设计一种受控生成框架对关键实体和疑问词进行约束,保证关键实体和疑问词同时出现在问题中,使生成的问题更加准确.在公开数据集WebQuestion和PathQuestion中验证所提模型的性能.实验结果表明,与现有模型LFKQG相比,所提模型的BLUE-4、METEOR、ROUGE-L指标在WebQuestion数据集上分别提升0.28、0.16、0.22个百分点,在PathQuestion数据集上分别提升0.8、0.39、0.46个百分点.

外文摘要：Pre-trained language models,which are trained on large-scale datasets with extensive computing power,can extract significant amounts of knowledge from unstructured text data.To address the limited information in current triplets,a method is proposed that utilizes pre-trained language models to enrich this knowledge.Initially,a textual knowledge generator is designed to enhance the semantics of the triplets by leveraging the extensive knowledge embedded in the pre-trained models.This generator transforms the information within the triplets into subgraph descriptions.Subsequently,a question type predictor is employed to determine the appropriate question words.These question words are essential for question generation as they help to locate the domain of the answer accurately,resulting in semantically coherent questions and enhanced control over the generation process.Finally,a controlled generation framework is developed to ensure that both key entities and question words appear in the generated questions,thereby increasing the accuracy of these questions.The efficacy of the proposed model is demonstrated on the public datasets WebQuestion and PathQuestion.When compared to the existing model LFKQG,the proposed model shows improvements in the BLUE-4,METEOR,and ROUGE-L metrics by 0.28,0.16,and 0.22 percentage points,respectively,on the WebQuestion dataset,and by 0.8,0.39,and 0.46 percentage points,respectively,on the PathQuestion dataset.

外文关键词：

natural language understandingquestion generationKnowledge Graph(KG)pre-trained language modelknowledge enhancement

作者：

陈佳玉、王元龙、张虎

展开 >

作者单位：

山西大学计算机与信息技术学院,山西太原 030006

关键词：

自然语言理解问题生成知识图谱预训练语言模型知识增强

基金：

国家自然科学基金

项目编号：

62176145

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068081

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(6)