首页|融合词信息与自注意力的教育命名实体识别

融合词信息与自注意力的教育命名实体识别

扫码查看
为解决教育领域命名实体识别任务精度较低和语料严重不足的问题,提出一种融合词信息与自注意力的命名实体识别模型WBBAC.该模型利用BERT预训练语言模型增强字向量的语义表示并为字向量引入词频信息,将字向量与词向量拼接作为双向长短期记忆网络的输入,经过自注意力层进一步寻找序列内部的联系,最后通过CRF解码获得最优序列.根据课程文本特点创建计算机组成原理数据集并进行标注,在Resume数据集和计算机组成原理数据集上进行实验,WBBAC模型的F1值分别为95.65%和73.94%.实验结果表明,与基线模型相比,WBBAC模型具有更高的F1值,有效解决了教育领域命名实体识别任务中标注数据不足的问题.
Educational Named Entity Recognition Integrating Word Information and Self-Attention Mechanism
To solve the problems of low accuracy and severe lack of corpus in named entity recognition tasks in the education field,a named entity recognition model WBBAC that integrates word information and self attention is proposed.This model utilizes a BERT pre trained lan-guage model to enhance the semantic representation of word vectors and introduces word frequency information into them.The word vectors are concatenated with each other as inputs to a bidirectional long short-term memory network,which further searches for internal connections with-in the sequence through a self attention layer.Finally,the optimal sequence is obtained through CRF decoding.Create a computer composition principle dataset based on the characteristics of the course text and annotate it.Conduct experiments on the Resume dataset and the computer composition principle dataset,and the F1 values of the WBBAC model are 95.65%and 73.94%,respectively.The experimental results show that compared with the baseline model,the WBBAC model has a higher F1 value,effectively solving the problem of insufficient annotated da-ta in named entity recognition tasks in the education field.

named entity recognitionword informationself-attention mechanismeducation domainBERT

郑守民、申艳光

展开 >

河北工程大学 信息与电气工程学院,河北 邯郸 056004

命名实体识别 词信息 自注意力机制 教育领域 BERT

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(9)