基于词汇增强的典型文物命名实体识别算法

A lexicon enhanced named entity recognition algorithm for typical cultural relics

崔鑫 ¹王琰 ²侯小刚 ²周月³

扫码查看

作者信息

1. 北京邮电大学计算机学院,北京 100876
2. 北京邮电大学人工智能学院,北京 100876
3. 北京邮电大学电子工程学院,北京 100876
折叠

摘要

典型文物的命名实体识别主要从句子中提取出文物名称、朝代、出土地点、馆藏地等类别的实体.典型文物数据具有构词的特殊性,使用现有命名实体识别方法在典型文物数据集上会遇到词边界判断错误等问题.本文提出了一种基于词汇增强的典型文物命名实体识别算法,算法在输入表示层和上下文编码层引入词汇信息,提高了词语领域专业性.算法通过构建文物领域词库,将其作为基于词汇增强的典型文物命名实体识别算法词典,较好地解决了词边界判断错误问题,在典型文物数据集上取得了较好的效果.

Abstract

Named entity recognition of typical cultural relics focuses on extracting entities from sentences in categories such as name of cultural relic,dynasty,excavation site,and place of collection.The data of typical cultural relics has the specificity of word construction,and using existing named entity recognition methods on typical cultural relics dataset will encounter problems such as wrong word boundary judgments.The algorithm introduces lexical information in both the input representation layer and the contextual encoding layer to improve the word domain expertise.By constructing a lexicon of heritage domain words,the algorithm is used as a lexicon for the lexically enhanced recognition algorithm of typical heritage named entities,which eventually solves the problem of incorrect word boundary judgement and achieves better results on the typical heritage dataset.

关键词

词汇增强/领域词库/命名实体识别

Key words

lexicon enhanced/domain thesaurus/named entity recognition

引用本文复制引用

基金项目

国家重点研发计划(2021TFF0901701)

出版年

2023

中国传媒大学学报(自然科学版)

中国传媒大学

中国传媒大学学报(自然科学版)

CHSSCD

影响因子：0.514

ISSN：1673-4793

参考文献量1

段落导航