首页|融合汉字多级特征与文本局部特征的中文命名实体识别

融合汉字多级特征与文本局部特征的中文命名实体识别

扫码查看
针对目前中文命名实体识别模型在复杂语境下准确率较低的问题,添加更多汉字特征以弥补词向量表形、表音方面的不足,引入更多先验知识,丰富语义特征;同时设计一种兼顾全局特征与局部特征的编码器,提升模型面对复杂语境时的鲁棒性与泛化性;实验结果表明,该文提出的方法在 Weibo、OntoNotes 5.0、Boson、People Daily数据集上F,值分别提升1.61、0.37、0.98、0.98,验证汉字本身特征的重要性与通用性的同时,也验证了文本局部特征有助于提升模型性能.此外,还探究了八种不同汉字编码方式对模型性能的影响,实验证明相比于单个拼音字符,汉字的声母、韵母携带更多发音信息,音调、多音字等特征也有利于提升模型性能;最后,在多种文本实例上测试了模型性能,实验结果表明了该文工作的有效性.
Chinese Named Entity Recognition Based on Multi-level Features of Chinese Characters and Local Features of Text
To improve the Chinese named entity recognition model,this paper proposes to introduce more Chinese character features to make up for the deficiency of the word vector in character form and pronunciation,and more prior knowledge to enrich the semantic features.It designs a local feature extractor considering both global and local features,so as to improve the robustness and generalization of the model in the face of complex contexts.The influ-ence of eight different Chinese character coding methods is also explored,disclosing that the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are al-so beneficial to improve the model performance.The experimental results show that the proposed method improves the F1 value by 1.61,0.37,0.98 and 0.98 respectively on Weibo,OntoNotes5.0,Boson and People Daily datasets,which proves the importance and universality of Chinese character features,and also proves that local features of text are helpful to improve the model performance.In addition,the influence of eight different Chinese character coding methods on the model performance is also explored.Experimental results show that compared with a single pinyin character,the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are also beneficial to improve the model performance.Finally,the performance of the model is tested on a variety of text examples,and the experimental results show the effectiveness of the proposed work.

character featurespinyin featureslocal features of textnamed entity recognition

张慧、秦董洪、白凤波、罗余特、刘成星、宋蕃桦

展开 >

广西民族大学人工智能学院,广西南宁 530000

字形特征 拼音特征 文本局部特征 命名实体识别

广西科技基地和人才专项广西壮族自治区中央引导地方科技发展资金项目

桂科AD23026054桂科ZY24212045

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(9)
  • 1