Self-Supervised Taxonomy Completion Based on Prior Knowledge-Guided Prompt Learning
Owing to the incompleteness of existing seed taxonomies in various fields and the emergence of a considerable number of new domain terms over time,seed taxonomies in various fields must be automatically completed.Existing self-supervised taxonomy completion methods utilize graph embedding technology;however,these methods do not fully utilize the rich semantic information provided by the pre-trained language model;they only focus on the local node relationship in the graph while ignoring the information contained in the overall graph structure.To address the above problems,a self-supervised taxonomy completion model based on prior knowledge-guided prompt learning named Pro-tax is proposed.The model integrates the semantic information of the pre-trained language model and the structural information of the seed taxonomy.First,based on the coarse-grained triplet characteristics of the query node on the vertical path,the building strategy of a self-supervised dataset is improved.Second,the matching based on the pre-training and fine-tuning modes is used for large samples.To strengthen the attention of the pre-trained language model to the true hypernyms during the fine-tuning stage,the prior knowledge attention of the synonyms or abbreviations of the true hypernyms is integrated into the prompt;therefore,the prompt can be used to guide the fine-tuning process of the pre-trained language model more effectively.During the matching stage,soft beam search rules are adopted to reduce time complexity.Specifically,in the local graph structure,the node embedding generated by prompt guidance is used to evaluate the query confidence level of the sibling nodes at the same level,whereas the walk method of vertical paths is used for path interception and sorting filter in the global graph structure.Third,for few-shot,matching based on prompt learning is used;concurrently,different template combinations and in-context demonstration are used to fine-tune the pre-trained language model.Finally,the experimental results on large public datasets in four different domains indicate that compared with the comparative model,the Mean Rank(MR),Mean Reciprocal Rank(MRR),and Hit@10 indicators of Pro-tax increase by 15%,0.057,and 0.030,respectively.
taxonomy completionprior knowledgeprompt learningself-supervisedpre-trained language model