首页|基于文本知识增强的问题生成模型

基于文本知识增强的问题生成模型

扫码查看
预训练语言模型在大规模训练数据和超大规模算力的基础上,能够从非结构化的文本数据中学到大量的知识。针对三元组包含信息有限的问题,提出利用预训练语言模型丰富知识的问题生成方法。首先,利用预训练语言模型中丰富的知识增强三元组信息,设计文本知识生成器,将三元组中的信息转化为子图描述,丰富三元组的语义;然后,使用问题类型预测器预测疑问词,准确定位答案所在的领域,从而生成语义正确的问题,更好地控制问题生成的效果;最后,设计一种受控生成框架对关键实体和疑问词进行约束,保证关键实体和疑问词同时出现在问题中,使生成的问题更加准确。在公开数据集WebQuestion和PathQuestion中验证所提模型的性能。实验结果表明,与现有模型LFKQG相比,所提模型的BLUE-4、METEOR、ROUGE-L指标在WebQuestion数据集上分别提升0。28、0。16、0。22个百分点,在PathQuestion数据集上分别提升0。8、0。39、0。46个百分点。
Question Generation Model Based on Text Knowledge Enhancement
Pre-trained language models,which are trained on large-scale datasets with extensive computing power,can extract significant amounts of knowledge from unstructured text data.To address the limited information in current triplets,a method is proposed that utilizes pre-trained language models to enrich this knowledge.Initially,a textual knowledge generator is designed to enhance the semantics of the triplets by leveraging the extensive knowledge embedded in the pre-trained models.This generator transforms the information within the triplets into subgraph descriptions.Subsequently,a question type predictor is employed to determine the appropriate question words.These question words are essential for question generation as they help to locate the domain of the answer accurately,resulting in semantically coherent questions and enhanced control over the generation process.Finally,a controlled generation framework is developed to ensure that both key entities and question words appear in the generated questions,thereby increasing the accuracy of these questions.The efficacy of the proposed model is demonstrated on the public datasets WebQuestion and PathQuestion.When compared to the existing model LFKQG,the proposed model shows improvements in the BLUE-4,METEOR,and ROUGE-L metrics by 0.28,0.16,and 0.22 percentage points,respectively,on the WebQuestion dataset,and by 0.8,0.39,and 0.46 percentage points,respectively,on the PathQuestion dataset.

natural language understandingquestion generationKnowledge Graph(KG)pre-trained language modelknowledge enhancement

陈佳玉、王元龙、张虎

展开 >

山西大学计算机与信息技术学院,山西太原 030006

自然语言理解 问题生成 知识图谱 预训练语言模型 知识增强

国家自然科学基金

62176145

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(6)