Recognition of Cotton Pests and Diseases Named Entities Based on RoBERTA Multi-feature Fusion
Aiming at the scarcity of cotton pest and disease text corpus data and the lack of Chinese named entity recognition corpus,and the problems of complexity,diversity and uneven distribution of the content of cotton pest and disease entities,a Chinese entity recognition corpus CDIPNER containing 11 categories of cotton pests and diseases entities was constructed,and a named entity recognition model based on RoBERTa multi-feature fusion was proposed.The model adopted RoBERTa pre-training model with stronger mask learning ability for character-level embedding vector conversion,extracted feature vectors jointly by BiLSTM and IDCNN models to capture the temporal and spatial features of the text,respectively,fused the extracted feature vectors using a multi-head self-attention mechanism,and finally generated predicted sequences using the CRF algorithm.The results showed that the model had 96.60%recognition accuracy,95.76%recall,and 96.18%F1 value for named entities in cotton pest and disease text;it also had good results on public datasets such as ResumeNER.The results indicate that the model could effectively identify named entities of cotton pest and disease and has certain generalisation ability.
CottonPests and diseasesRoBERTa modelNamed entity recognitionMulti-feature fusionMulti-head attention mechanism