计算机工程与设计2024,Vol.45Issue(1) :261-267.DOI:10.16208/j.issn1000-7024.2024.01.033

基于ALBERT的中文简历命名实体识别

Recognition of named entity in Chinese resume based on ALBERT

余丹丹 黄洁 党同心 张克
计算机工程与设计2024,Vol.45Issue(1) :261-267.DOI:10.16208/j.issn1000-7024.2024.01.033

基于ALBERT的中文简历命名实体识别

Recognition of named entity in Chinese resume based on ALBERT

余丹丹 1黄洁 2党同心 2张克2
扫码查看

作者信息

  • 1. 郑州大学网络空间安全学院,河南郑州 450003;战略支援部队信息工程大学 数据目标工程学院,河南 郑州 450001
  • 2. 战略支援部队信息工程大学 数据目标工程学院,河南 郑州 450001
  • 折叠

摘要

现有的电子简历实体识别方法准确率低,采用BERT预训练语言模型虽能取得较高的准确率,但BERT模型参数量过大,训练时间长,其实际应用场景受限,提出一种基于ALBERT的中文电子简历命名实体识别方法.通过轻量版AL-BERT 语言模型对输人文本进行词嵌入,获取动态词向量,解决一词多义的问题;使用BiLSTM获取上下文结构特征,深层次挖掘语义关系;将拼接后的向量输入到CRF层进行维特比解码,学习标签间约束关系,输出正确标签.实验结果表明,该方法在Resume电子简历数据集中取得了 94.86%的F1值.

Abstract

The existing electronic resume entity recognition method shows low accuracy rate.Although the BERT pre-training language model can achieve a high accuracy rate,the BERT model has too many parameters,long training time,and limited practical application scenarios.A named entity recognition method for Chinese electronic resumes based on ALBERT was pro-posed.The input text was embedded through the lightweight version of ALBERT language model,dynamic word vectors were obtained,and the problem of polysemy was solved.The BiLSTM was used to obtain context structure features and deeply mine semantic relationships.The spliced vector was inputted to the CRF layer for Viterbi decoding,the constraint relationship between labels was learned,and the correct label was outputted.Experimental results show that the method achieves 94.86%F1 value in the Resume electronic resume dataset.

关键词

电子简历/命名实体识别/预训练语言模型/双向长短时记忆网络/条件随机场/神经网络/深度学习

Key words

e-resume/NER/pre-training language model/BiLSTM/CRF/neural network/deep learning

引用本文复制引用

基金项目

国家自然科学基金项目(62071490)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量9
段落导航相关论文