两阶段文档筛选和异步多粒度图多跳问答
Two-stage Document Filtering and Asynchronous Multi-granularity Graph Multi-hop Question Answering
张雪松 1李冠君 2聂士佳 1张大伟 2吕钊 3陶建华4
作者信息
- 1. 安徽大学 计算机科学与技术学院,安徽 合肥 230601;中国科学院自动化研究所 模式识别国家重点实验室,北京 100190
- 2. 中国科学院自动化研究所 模式识别国家重点实验室,北京 100190
- 3. 安徽大学 计算机科学与技术学院,安徽 合肥 230601
- 4. 清华大学 自动化系,北京 100084
- 折叠
摘要
多跳问答旨在通过对多篇文档内容进行推理,来预测问题答案以及针对答案的支撑事实.然而当前的多跳问答方法在文档筛选任务中旨在找到与问题相关的所有文档,未考虑到这些文档是否都对找到答案有所帮助.因此,该文提出一种两阶段的文档筛选方法.第一阶段通过对文档进行评分且设置较小的阈值来获取尽可能多的与问题相关文档,保证文档的高召回率;第二阶段对问题答案的推理路径进行建模,在第一阶段的基础上再次提取文档,保证文档的高精确率.此外,针对由文档构成的多粒度图,提出一种新颖的异步更新机制来进行答案预测以及支撑事实预测.提出的异步更新机制将多粒度图分为异质图和同质图来进行异步更新以更好地进行多跳推理.该方法在性能上优于目前主流的多跳问答方法,验证了该方法的有效性.
Abstract
Multi-hop question answering aims to predict the answer to a question and the supporting facts for the answer by reasoning over the content of multiple documents.However,current multi-hop question answering methods aim to find all documents related to the question in the document filtering task,without considering whether all these documents are useful for finding the answer.Therefore,we propose a two-stage document filtering approach.In the first stage,the documents are scored and a small threshold is set to obtain as many relevant documents as possible to ensure a high recall of documents.In the second stage,the inference path of the question answer is modeled,and the documents are extracted again based on the first stage to ensure high accuracy.In addition,we propose a novel asyn-chronous update mechanism for answer prediction and supporting fact prediction for multi-granularity graph composed of documents.The proposed asynchronous update mechanism divides the multi-grain graph into heterogeneous and homogeneous graphs to perform a-synchronous updates for better multi-hop inference.The performance of the proposed method is better than that of the current mainstream multi hop question answering method,and the effectiveness of the proposed method is verified.
关键词
多跳问答/文档筛选/多粒度图/异步更新/答案预测Key words
multi-hop question answering/document filtering/multi-granularity graph/asynchronous update/answer prediction引用本文复制引用
基金项目
国家重点研发计划(2020AAA0140003)
浙江实验室开放研究项目(2021KH0AB06)
北京市科委、中关村管委会计划(Z211100004821013)
出版年
2024