首页|基于先验知识引导提示学习的自监督分类法补全

基于先验知识引导提示学习的自监督分类法补全

扫码查看
由于现有各领域种子分类法不完整,且随着时间的推移,涌现出大量新的领域术语,使得各领域种子分类法有待自动补全。现有的自监督分类法补全方法采用图嵌入技术,并未充分利用预训练语言模型所提供的丰富语义信息,且只关注图中局部的节点关系,忽视了整体图结构所蕴含的信息。针对上述问题,提出一个基于先验知识引导提示学习的自监督分类法补全模型,该模型融合了预训练语言模型的语义信息和种子分类法的结构信息。根据查询节点在垂直路径上存在粗粒度三元组的特性,改进自监督数据集构建策略。在大样本情况下,利用基于预训练和微调模式进行匹配。在微调阶段,为了加强预训练语言模型对真实上位词的关注,在提示(prompt)中融入真实上位词的同义词或缩略词的先验知识注意力,从而更有效地利用prompt来引导预训练模型的微调过程。在匹配阶段,为了降低时间复杂度,采用软束搜索规则,具体来说,在局部图结构上,利用prompt指导生成的节点嵌入来评估同级对兄弟节点的查询置信度;在整体图结构上,采用垂直路径的游走方法进行路径截取与排序筛选。在小样本情况下,利用基于提示学习的模式进行匹配,同时采用不同模板组合和上下文示例去微调预训练语言模型。在4个不同领域的大型公开数据集上进行实验,结果表明,相较于对比模型,该模型的MR、MRR、Hit@10 指标分别提升 15%、0。057、0。030。
Self-Supervised Taxonomy Completion Based on Prior Knowledge-Guided Prompt Learning
Owing to the incompleteness of existing seed taxonomies in various fields and the emergence of a considerable number of new domain terms over time,seed taxonomies in various fields must be automatically completed.Existing self-supervised taxonomy completion methods utilize graph embedding technology;however,these methods do not fully utilize the rich semantic information provided by the pre-trained language model;they only focus on the local node relationship in the graph while ignoring the information contained in the overall graph structure.To address the above problems,a self-supervised taxonomy completion model based on prior knowledge-guided prompt learning named Pro-tax is proposed.The model integrates the semantic information of the pre-trained language model and the structural information of the seed taxonomy.First,based on the coarse-grained triplet characteristics of the query node on the vertical path,the building strategy of a self-supervised dataset is improved.Second,the matching based on the pre-training and fine-tuning modes is used for large samples.To strengthen the attention of the pre-trained language model to the true hypernyms during the fine-tuning stage,the prior knowledge attention of the synonyms or abbreviations of the true hypernyms is integrated into the prompt;therefore,the prompt can be used to guide the fine-tuning process of the pre-trained language model more effectively.During the matching stage,soft beam search rules are adopted to reduce time complexity.Specifically,in the local graph structure,the node embedding generated by prompt guidance is used to evaluate the query confidence level of the sibling nodes at the same level,whereas the walk method of vertical paths is used for path interception and sorting filter in the global graph structure.Third,for few-shot,matching based on prompt learning is used;concurrently,different template combinations and in-context demonstration are used to fine-tune the pre-trained language model.Finally,the experimental results on large public datasets in four different domains indicate that compared with the comparative model,the Mean Rank(MR),Mean Reciprocal Rank(MRR),and Hit@10 indicators of Pro-tax increase by 15%,0.057,and 0.030,respectively.

taxonomy completionprior knowledgeprompt learningself-supervisedpre-trained language model

陈志强、仇瑜、朱宇、王晓英

展开 >

青海大学计算机技术与应用系,青海西宁 810000

北京智谱华章科技有限公司,北京 100084

分类法补全 先验知识 提示学习 自监督 预训练语言模型

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(12)