中文信息学报2024,Vol.38Issue(5) :32-40.

基于话头话体共享结构信息的机器阅读理解研究

Machine Reading Comprehension Based on Shared Structure Information between Naming and Telling

韩玉蛟 罗智勇 张明明 赵志琳 张青
中文信息学报2024,Vol.38Issue(5) :32-40.

基于话头话体共享结构信息的机器阅读理解研究

Machine Reading Comprehension Based on Shared Structure Information between Naming and Telling

韩玉蛟 1罗智勇 1张明明 1赵志琳 1张青1
扫码查看

作者信息

  • 1. 北京语言大学信息科学学院,北京 100083
  • 折叠

摘要

机器阅读理解(Machine Reading Comprehension,MRC)任务旨在让机器回答给定上下文的问题来测试机器理解自然语言的能力.目前,基于大规模预训练语言模型的神经机器阅读理解模型已经取得重要进展,但在涉及答案要素、线索要素和问题要素跨标点句、远距离关联时,答案抽取的准确率还有待提升.该文通过篇章内话头话体结构分析,建立标点句间远距离关联关系,补全共享缺失成分,辅助机器阅读理解答案抽取;设计和实现融合话头话体结构信息的机器阅读理解模型,在公开数据集CMRC2018上的实验结果表明,模型的F1值相对于基线模型提升2.4%,EM值提升6%.

Abstract

The machine reading comprehension(MRC)task challenges the machine's ability to understand natural language by asking the machine to answer questions in a given context.To improve the accuracy of answer extraction involving the crossing of punctuation sentences and long-distance correlation of answer elements,clue elements and question elements,this paper proposes to model the long-distance relationship between punctuation sentences,and complement the missing components by shared structure.A machine reading comprehension model is implemented by integrating the Naming-Telling structure information.The experimental results on the public data set CMRC2018 show that the proposed method achieves an increase of 2.4%in F1-value and 6%in EM value compared with the baseline model.

关键词

机器阅读理解/话头话体结构分析/注意力机制/预训练语言模型

Key words

machine reading comprehension/naming-telling structure/attention/pretraining language model

引用本文复制引用

基金项目

国家自然科学基金(62076037)

出版年

2024
中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCSCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
参考文献量1
段落导航相关论文