语义图支持的阅读理解型问题的自动生成

Generating reading comprehension questions automatically based on semantic graphs

徐坚¹

扫码查看

作者信息

1. 云南师范大学民族教育信息化教育部重点实验室, 云南昆明 650500;曲靖师范学院信息工程学院, 云南曲靖 655011
折叠

摘要

问题自动生成是人工智能领域的一项技术,其目标是根据输入的文本模拟人类的能力,自动生成相关问题.目前的问题自动生成研究主要基于通用数据集生成问题,缺乏专门针对教育领域的问题生成研究.为此,专注于面向中学生的问题自动生成进行研究.构建一个专门为问题生成模型训练需求而设计的数据集RACE4QG,以满足中学生教育领域的独特需求;开发一个端到端的问题自动生成模型,该模型训练于数据集RACE4Q,并采用改进型"编码器-解码器"方案,编码器主要采用两层双向门控循环单元,其输入为单词和答案标记的嵌入表示,编码器的隐藏层采用门控自注意力机制获得"文章和答案"的联合表示后,再输入到解码器生成问题.试验结果显示,该模型优于最优基线模型,3 个评价指标BLEU-4、ROUGE-L和METEOR分别提高了3.61%、1.66%和1.44%.

Abstract

Automatic question generation is a technology in the field of artificial intelligence.Its goal is to simulate hu-man capabilities and automatically generate relevant questions based on input text.Current research on automatic ques-tion generation is mainly based on generating questions from general datasets,and there is a lack of research on ques-tion generation specifically targeting the field of education.To this end,this article focuses on the automatic generation of questions for middle school students.First,this article constructs a dataset RACE4QG specifically designed for the training needs of question generation models to meet the unique needs of the field of middle school student education.Secondly,we developed an end-to-end automatic problem generation model,which was trained on the RACE4Q dataset.In the improved"encoder-decoder"scheme,the encoder mainly adopts a two-layer bidirectional gated recurrent unit,whose input is the word embedding and answer-tagging embedding,and the hidden layer of the encoder adopts the gated self-attention mechanism to obtain the passage-answer representation,which is then fed to the decoder to generate questions.The experimental results show that the model in this paper is better than the optimal baseline model,and the three evaluation indicators BLEU-4,ROUGE-L,and METEOR are improved by 3.61,1.66,and 1.44 points,respect-ively.

关键词

语义图/数据集/自动问题生成模型/编码器/解码器/答案标记/图注意力网络/门控循环单元

Key words

semantic graph/dataset/automatic question generation model/encoder/decoder/answer tagging/graph at-tention network/gated recurrent units

引用本文复制引用

基金项目

国家自然科学基金(62166050)

云南师范大学研究生科研创新基金(2020)(YSDBS178)

出版年

2024

智能系统学报

中国人工智能学会　哈尔滨工程大学

智能系统学报

CSTPCD北大核心

影响因子：0.672

ISSN：1673-4785

参考文献量31

段落导航