基于"预训练+微调"范式的实体关系联合抽取方法依赖大规模标注数据,在数据标注难度大、成本高的中文古籍小样本场景下微调效率低,抽取性能不佳;中文古籍中普遍存在实体嵌套和关系重叠的问题,限制了实体关系联合抽取的效果;管道式抽取方法存在错误传播问题,影响抽取效果.针对以上问题,提出一种基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法.首先,利用区间抽取式阅读理解的提示学习方法对预训练语言模型(PLM)注入领域知识以统一预训练和微调的优化目标,并对输入句子进行编码表示;其次,使用全局指针网络分别对主、客实体边界和不同关系下的主、客实体边界进行预测和联合解码,对齐成实体关系三元组,并构建了PTBG(Prompt Tuned BERT with Global pointer)模型,解决实体嵌套和关系重叠问题,同时避免了管道式解码的错误传播问题;最后,在上述工作基础上分析了不同提示模板对抽取性能的影响.在《史记》数据集上进行实验的结果表明,相较于注入领域知识前后的OneRel模型,PTBG模型所取得的F1值分别提升了1.64和1.97个百分点.可见,PTBG模型能更好地对中文古籍实体关系进行联合抽取,为低资源的小样本深度学习场景提供了新的研究思路与方法.
Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network
Joint entity-relation extraction methods based on"pre-training+fine-tuning"paradigm rely on large-scale annotated data.In the small sample scenarios of ancient Chinese books where data annotation is difficult and costly,the fine-tuning efficiency is low and the extraction performance is poor;entity nesting and relation overlapping problems are common in ancient Chinese books,which limit the effect of joint entity-relation extraction;pipeline extraction methods have error propagation problems,which affect the extraction effect.In response to the above problems,a joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network was proposed.Firstly,the prompt learning method of span extraction reading comprehension was used to inject domain knowledge into the Pre-trained Language Model(PLM)to unify the optimization goals of pre-training and fine-tuning,and the input sentences were encoded.Then,the global pointer networks were used to predict and jointly decode the boundaries of subject and object and the boundaries of subject and object of different relationships,so as to align into entity-relation triples,and complete the construction of PTBG(Prompt Tuned BERT with Global pointer)model.As the results,the problem of entity nesting and relation overlapping was solved,and the error propagation problem of pipeline decoding was avoided.Finally,based on the above work,the influence of different prompt templates on extraction performance was analyzed.Experimental results on Records of the Grand Historian dataset show that compared with OneRel model before and after injecting domain knowledge,the PTBG model has the F1-value increased by 1.64 and 1.97 percentage points respectively.It can be seen that the PTBG model can better extract entity-relation jointly in ancient Chinese books,and provides new research ideas and approaches for low-resource,small-sample deep learning scenarios.
joint entity-relation extractionglobal pointer networkprompt learningPre-trained Language Model(PLM)ancient Chinese books