计算机工程2024,Vol.50Issue(6) :86-93.DOI:10.19678/j.issn.1000-3428.0068081

基于文本知识增强的问题生成模型

Question Generation Model Based on Text Knowledge Enhancement

陈佳玉 王元龙 张虎
计算机工程2024,Vol.50Issue(6) :86-93.DOI:10.19678/j.issn.1000-3428.0068081

基于文本知识增强的问题生成模型

Question Generation Model Based on Text Knowledge Enhancement

陈佳玉 1王元龙 1张虎1
扫码查看

作者信息

  • 1. 山西大学计算机与信息技术学院,山西太原 030006
  • 折叠

摘要

预训练语言模型在大规模训练数据和超大规模算力的基础上,能够从非结构化的文本数据中学到大量的知识.针对三元组包含信息有限的问题,提出利用预训练语言模型丰富知识的问题生成方法.首先,利用预训练语言模型中丰富的知识增强三元组信息,设计文本知识生成器,将三元组中的信息转化为子图描述,丰富三元组的语义;然后,使用问题类型预测器预测疑问词,准确定位答案所在的领域,从而生成语义正确的问题,更好地控制问题生成的效果;最后,设计一种受控生成框架对关键实体和疑问词进行约束,保证关键实体和疑问词同时出现在问题中,使生成的问题更加准确.在公开数据集WebQuestion和PathQuestion中验证所提模型的性能.实验结果表明,与现有模型LFKQG相比,所提模型的BLUE-4、METEOR、ROUGE-L指标在WebQuestion数据集上分别提升0.28、0.16、0.22个百分点,在PathQuestion数据集上分别提升0.8、0.39、0.46个百分点.

Abstract

Pre-trained language models,which are trained on large-scale datasets with extensive computing power,can extract significant amounts of knowledge from unstructured text data.To address the limited information in current triplets,a method is proposed that utilizes pre-trained language models to enrich this knowledge.Initially,a textual knowledge generator is designed to enhance the semantics of the triplets by leveraging the extensive knowledge embedded in the pre-trained models.This generator transforms the information within the triplets into subgraph descriptions.Subsequently,a question type predictor is employed to determine the appropriate question words.These question words are essential for question generation as they help to locate the domain of the answer accurately,resulting in semantically coherent questions and enhanced control over the generation process.Finally,a controlled generation framework is developed to ensure that both key entities and question words appear in the generated questions,thereby increasing the accuracy of these questions.The efficacy of the proposed model is demonstrated on the public datasets WebQuestion and PathQuestion.When compared to the existing model LFKQG,the proposed model shows improvements in the BLUE-4,METEOR,and ROUGE-L metrics by 0.28,0.16,and 0.22 percentage points,respectively,on the WebQuestion dataset,and by 0.8,0.39,and 0.46 percentage points,respectively,on the PathQuestion dataset.

关键词

自然语言理解/问题生成/知识图谱/预训练语言模型/知识增强

Key words

natural language understanding/question generation/Knowledge Graph(KG)/pre-trained language model/knowledge enhancement

引用本文复制引用

基金项目

国家自然科学基金(62176145)

出版年

2024
计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
段落导航相关论文