融合词信息与自注意力的教育命名实体识别

扫码查看

原文链接

万方数据
维普

中文摘要：为解决教育领域命名实体识别任务精度较低和语料严重不足的问题,提出一种融合词信息与自注意力的命名实体识别模型WBBAC.该模型利用BERT预训练语言模型增强字向量的语义表示并为字向量引入词频信息,将字向量与词向量拼接作为双向长短期记忆网络的输入,经过自注意力层进一步寻找序列内部的联系,最后通过CRF解码获得最优序列.根据课程文本特点创建计算机组成原理数据集并进行标注,在Resume数据集和计算机组成原理数据集上进行实验,WBBAC模型的F1值分别为95.65%和73.94%.实验结果表明,与基线模型相比,WBBAC模型具有更高的F1值,有效解决了教育领域命名实体识别任务中标注数据不足的问题.

外文标题：Educational Named Entity Recognition Integrating Word Information and Self-Attention Mechanism

外文摘要：To solve the problems of low accuracy and severe lack of corpus in named entity recognition tasks in the education field,a named entity recognition model WBBAC that integrates word information and self attention is proposed.This model utilizes a BERT pre trained lan-guage model to enhance the semantic representation of word vectors and introduces word frequency information into them.The word vectors are concatenated with each other as inputs to a bidirectional long short-term memory network,which further searches for internal connections with-in the sequence through a self attention layer.Finally,the optimal sequence is obtained through CRF decoding.Create a computer composition principle dataset based on the characteristics of the course text and annotate it.Conduct experiments on the Resume dataset and the computer composition principle dataset,and the F1 values of the WBBAC model are 95.65%and 73.94%,respectively.The experimental results show that compared with the baseline model,the WBBAC model has a higher F1 value,effectively solving the problem of insufficient annotated da-ta in named entity recognition tasks in the education field.

外文关键词：

named entity recognitionword informationself-attention mechanismeducation domainBERT

作者：

郑守民、申艳光

展开 >

作者单位：

河北工程大学信息与电气工程学院,河北邯郸 056004

关键词：

命名实体识别词信息自注意力机制教育领域 BERT

出版年：

2024

DOI：