计算机系统应用2024,Vol.33Issue(9) :38-47.DOI:10.15888/j.cnki.csa.009605

融合多特征的骨签释文实体识别

Entity Recognition for Interpretation of Bone-sign Integrated with Multiple Features

石雨梦 王慧琴 王展 刘瑞 王可
计算机系统应用2024,Vol.33Issue(9) :38-47.DOI:10.15888/j.cnki.csa.009605

融合多特征的骨签释文实体识别

Entity Recognition for Interpretation of Bone-sign Integrated with Multiple Features

石雨梦 1王慧琴 1王展 2刘瑞 3王可1
扫码查看

作者信息

  • 1. 西安建筑科技大学信息与控制工程学院,西安 710055
  • 2. 陕西省文物保护研究院,西安 710075
  • 3. 中国社会科学院考古研究所,北京 100101
  • 折叠

摘要

构建适用于汉长安城骨签释文的命名实体识别模型,用来解决由于汉长安城骨签释文关键内容缺失,而导致无法对部分骨签释文进行分类的问题.本文将汉长安城骨签释文原始文本作为数据集,采用BIOE(begin,inside,outside,end)标注方法对释文实体进行数据标注,并提出融合字结构特征、字词结构特征的多特征融合网络模型(multi-feature fusion network,MFFN).该模型不仅考虑了单个字符的结构特征,还融合了字与词的结构特征,以增强模型对骨签释文的理解能力.实验结果表明,MFFN模型能够更好地识别汉长安城骨签释文的命名实体,实现骨签释文分类,优于现有NER模型,为历史学家和研究人员提供更加丰富和准确的数据支持.

Abstract

This study constructs a named entity recognition(NER)model suitable for the bone-sign interpretations of Han Chang'an City to solve the problem of the inability to classify some bone-sign interpretations due to the lack of key content.The original text of the bone-sign interpretations of Han Chang'an City is used as the dataset,and the begin,inside,outside,end(BIOE)annotation method is utilized to annotate the bone-sign interpretation entities.A multi-feature fusion network(MFFN)model is proposed,which not only considers the structural features of individual characters but also integrates the structural features of character-word combinations to enhance the model's comprehension of the bone-sign interpretations.The experimental results demonstrate that the MFFN model can better identify the named entities of the bone-sign interpretations of Han Chang'an City and classify the bone-sign interpretations,outperforming existing NER models.This model provides historians and researchers with richer and more precise data support.

关键词

骨签/实体识别/BIOE标注方法/多特征融合/释文分类

Key words

bone-sign/entity recognition/BIOE annotation method/multiple features fusion/classification of interpretation

引用本文复制引用

基金项目

国家社科基金冷门绝学研究专项(20VJXT001)

出版年

2024
计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
参考文献量8
段落导航相关论文