首页|融合知识的文博领域低资源命名实体识别方法研究

融合知识的文博领域低资源命名实体识别方法研究

扫码查看
文物数据的实体嵌套问题明显,实体边界不唯一,且文博领域已标注数据极度缺乏,导致该领域命名实体识别性能较低.针对这些问题,构建一个可用于文物命名实体识别的数据集 FewRlicsData,提出一种融合知识的文博领域低资源命名实体识别方法 RelicsNER.该方法将类别描述信息的语义知识融入文物文本中,使用基于跨度的方式进行解码,用于改善实体嵌套问题,并采用边界平滑的方式缓解跨度识别模型的过度自信问题.与基线模型相比,该方法在 FewRlicsData 数据集上的 F1 值有所提升,在文博领域命名实体识别任务中取得较好的性能.在公开数据集 OntoNotes 4.0 上的实验结果证明该方法具有较好的泛化性,同时在数据集 OntoNotes 4.0 和 MSRA 上进行小规模数据实验,性能均高于基线模型,说明所提方法适用于低资源场景.
A Low-Resource Named Entity Recognition Method for Cultural Heritage Field Incorporating Knowledge Fusion
In cultural heritage field,entity nesting of cultural relics data is obvious,the entity boundary is not unique,and the marked data in the field of cultural relics is extremely lacking.All the problems above can lead to the low recognition performance of named entities in the field of cultural relics.To address these issues,we construct a dataset called FewRlicsData for NER in the field of cultural heritage and propose a knowledge-enhanced,low-resource NER method RelicsNER.This method integrates the semantic knowledge of category description information into the cultural relics text,employs the span-based method to decode and solve the entity nesting problem,and uses the boundary smoothing method to alleviate the overconfidence problem of span recognition model.Compared with the baseline model,the proposed method achieves higher F1 scores on the FewRlicsData dataset and demonstrates good performance in named entity recognition tasks in the cultural heritage field.Experimental results on the public dataset OntoNotes 4.0 indicate that the proposed method has good generalization ability.Additionally,small-scale data experiments on OntoNotes 4.0 and MSRA datasets show that the performance of the proposed method surpasses that of the baseline model,demonstrating its applicability in low-resource scenarios.

cultural heritage fieldnamed entity recognitionknowledge fusionattention mechanism

李超、侯霞、乔秀明

展开 >

北京信息科技大学计算机学院, 北京 100192

文博领域 命名实体识别 知识融合 注意力机制

北京市自然科学基金

4224090

2024

北京大学学报(自然科学版)
北京大学

北京大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.785
ISSN:0479-8023
年,卷(期):2024.60(1)
  • 4