Educational Named Entity Recognition Integrating Word Information and Self-Attention Mechanism
To solve the problems of low accuracy and severe lack of corpus in named entity recognition tasks in the education field,a named entity recognition model WBBAC that integrates word information and self attention is proposed.This model utilizes a BERT pre trained lan-guage model to enhance the semantic representation of word vectors and introduces word frequency information into them.The word vectors are concatenated with each other as inputs to a bidirectional long short-term memory network,which further searches for internal connections with-in the sequence through a self attention layer.Finally,the optimal sequence is obtained through CRF decoding.Create a computer composition principle dataset based on the characteristics of the course text and annotate it.Conduct experiments on the Resume dataset and the computer composition principle dataset,and the F1 values of the WBBAC model are 95.65%and 73.94%,respectively.The experimental results show that compared with the baseline model,the WBBAC model has a higher F1 value,effectively solving the problem of insufficient annotated da-ta in named entity recognition tasks in the education field.
named entity recognitionword informationself-attention mechanismeducation domainBERT