首页|基于多维特征分析的戏曲类方志文献命名实体识别研究

基于多维特征分析的戏曲类方志文献命名实体识别研究

扫码查看
方志是我国特有的一种具有极高史料价值的地方文献,对其进行数字化处理并实施知识挖掘,对传承传播中华传统文化、建设文化强国具有重要意义.命名实体识别作为一种基础性技术与关键环节,对方志知识组织与发现具有重要影响.目前,虽然方志命名实体识别已经取得了一定进展,但是仍缺乏适应方志文本特征与领域资源特征的系统化技术方案.基于此,本文提出融合多维特征与Bi-LSTM-CRF的戏曲类方志命名实体识别模型.首先,结合句法特征与符号、尾词、构词、上下文和负例等文本特征,对方志文献中的戏曲类实体特质进行解析;其次,利用在长文本结构中表现优异的Bi-LSTM-CRF模型,借助已解析的戏曲类实体特征,提升实体识别效率;最后,以《楚剧志》为具体对象开展实证研究,结果表明,本文提出的模型在命名实体识别效果上优于基准模型,F1值达到0.869.
Named Entity Recognition of Local Chronicles Literature in Traditional Chinese Opera Based on Multi-dimensional Feature Analysis
Local chronicles are a unique and highly valuable form of regional documentation in China.Digitizing and im-plementing knowledge mining for these records is crucial for the inheritance and dissemination of traditional Chinese cul-ture,as well as for the construction of a culturally strong nation.Named entity recognition(NER)plays a crucial role as a fundamental technology in organizing and discovering knowledge within local chronicles.Although there has been some progress in NER for local chronicles,a systematic technical solution that adapts to the specific features of these texts and the characteristics of domain resources is still lacking.Therefore,this study proposes a novel approach for named entity recognition in traditional Chinese opera local chronicles by integrating multi-dimensional features with Bi-LSTM-CRF.First,by combining syntactic features with textual features such as symbols,suffixes,word structure,context,and negative examples,the distinctive traits of opera entities within local chronicles are analyzed.Thereafter,the Bi-LSTM-CRF model,which performs well in long text structures,is utilized to improve the efficiency of entity recognition with the help of parsed features of opera-like entities.Finally,empirical research is conducted using the specific case of the"Chu Opera Chronicles."The results demonstrate that the proposed model outperforms the baseline model in terms of named entity rec-ognition,achieving an F1 score of 0.869.

local chronicles literaturelocal chronicles on traditional Chinese operanamed entity recognitionBi-LSTM-CRFmulti-dimensional feature analysis

翟姗姗、余华娟、陈健瑶、夏立新

展开 >

华中师范大学信息管理学院,武汉 430079

武汉大学文化遗产智能计算实验室,武汉 430072

威斯康星大学密尔沃基分校,密尔沃基 53202

方志文献 戏曲类方志 命名实体识别 Bi-LSTM-CRF 多维特征分析

国家社会科学基金一般项目教育部哲学社会科学实验室-武汉大学文化遗产智能计算实验室开放基金项目

20BTQ0712023ICLCH007

2024

情报学报
中国科学技术情报学会 中国科学技术信息研究所

情报学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.296
ISSN:1000-0135
年,卷(期):2024.43(9)