首页|一种基于子图转述的问题生成方法

一种基于子图转述的问题生成方法

扫码查看
本文提出了一种子图转述的方法用于解决知识图谱问题生成中的未见谓词问题.传统的问题生成方法主要利用标注的问答数据(问题-逻辑形式对)生成问题,然而标注数据难以覆盖知识图谱中所有的谓词,如何对未见的谓词生成问题依然是一个挑战.本文提出了一种基于子图结构的语义解耦方法,通过将复杂问题对应的知识图谱子图分解为原子级子图,从而将包含未见谓词的多跳子图拆分为易于处理的单跳子图.并且本文设计了一种子图转述方法,通过对数据集中的谓词进行采样,得到子图描述文本,并在大规模无监督数据上训练得到子图转述器,能够为包含未见谓词的子图提供自然语言形式的表述,为生成问题提供了有效的信息.本文定量分析了在不同的难度级别下模型的性能表现,在GrailQA等数据集上的实验结果表明,本文的方法达到了最先进的性能.
A Question Generation Method Based on Subgraph Paraphrase
This paper proposes a method based on subgraph rephrasing to solve the problem of unseen predicates in question generation over knowledge graph.Traditional KBQG(Question Generation over Knowledge Base)methods main-ly use annotated Q&A(Question and Answer)data(question and logic formal pairs)to generate questions.However,anno-tated data can't fully cover all predicates in the knowledge graph.It is still a challenge to generate questions with unseen predicates in the knowledge graph.In this paper,we propose a semantic decoupling method based on subgraph structure.By decomposing the subgraph corresponding to a complex question into atomic subgraphs,the multi-hop subgraph containing unseen predicates can be divided into single-hop subgraphs that are easy to handle.In addition,we design a subgraph re-phrasing procedure to train a subgraph rewriter on large-scale unsupervised data through sampling the predicates in the data-set by subgraph sampling.The subgraph rewriter will provide natural language form for subgraphs and effective information for generating questions.This paper quantitatively analyzes the performance of the model at different difficulty levels.The experimental results on GrailQA and other datasets show that our method achieves the state-of-the-art performance.

subgraph samplingsubgraph representationunseen predicatesquestion generationknowledge graph

温立强、熊冠铭、王宇、陈一朴、李伟平、赵文

展开 >

北京大学软件与微电子学院,北京 102600

北京大学软件工程国家工程研究中心,北京 100871

子图采样 子图转述 未见谓词 问题生成 知识图谱

国家重点研发计划项目

2021YFC3340301

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(10)