首页|基于RoBERTa多特征融合的棉花病虫害命名实体识别

基于RoBERTa多特征融合的棉花病虫害命名实体识别

扫码查看
针对棉花病虫害文本语料数据匮乏且缺少中文命名实体识别语料库,棉花病虫害实体内容复杂、类型多样且分布不均等问题,构建了包含11种类别的棉花病虫害中文实体识别语料库CDIPNER,提出了一种基于RoBERTa多特征融合的命名实体识别模型.该模型采用掩码学习能力更强的RoBERTa预训练模型进行字符级嵌入向量转换,通过BiLSTM和IDCNN模型联合抽取特征向量,分别捕捉文本的时序和空间特征,使用多头自注意力机制将抽取的特征向量进行融合,最后利用CRF算法生成预测序列.结果表明,该模型对于棉花病虫害文本中命名实体的识别精确率为96.60%,召回率为95.76%,F1值为96.18%;在ResumeNER等公开数据集上也有较好的效果.表明该模型能有效地识别棉花病虫害命名实体且具有一定的泛化能力.
Recognition of Cotton Pests and Diseases Named Entities Based on RoBERTA Multi-feature Fusion
Aiming at the scarcity of cotton pest and disease text corpus data and the lack of Chinese named entity recognition corpus,and the problems of complexity,diversity and uneven distribution of the content of cotton pest and disease entities,a Chinese entity recognition corpus CDIPNER containing 11 categories of cotton pests and diseases entities was constructed,and a named entity recognition model based on RoBERTa multi-feature fusion was proposed.The model adopted RoBERTa pre-training model with stronger mask learning ability for character-level embedding vector conversion,extracted feature vectors jointly by BiLSTM and IDCNN models to capture the temporal and spatial features of the text,respectively,fused the extracted feature vectors using a multi-head self-attention mechanism,and finally generated predicted sequences using the CRF algorithm.The results showed that the model had 96.60%recognition accuracy,95.76%recall,and 96.18%F1 value for named entities in cotton pest and disease text;it also had good results on public datasets such as ResumeNER.The results indicate that the model could effectively identify named entities of cotton pest and disease and has certain generalisation ability.

CottonPests and diseasesRoBERTa modelNamed entity recognitionMulti-feature fusionMulti-head attention mechanism

李东亚、白涛、香慧敏、戴硕、王震鲁、陈珍

展开 >

新疆农业大学 计算机与信息工程学院,新疆 乌鲁木齐 830052

智能农业教育部工程研究中心,新疆乌鲁木齐 830052

新疆农业信息化工程技术研究中心,新疆 乌鲁木齐 830052

新疆科信职业技术学院,新疆 乌鲁木齐 830049

展开 >

棉花 病虫害 RoBERTa模型 命名实体识别 多特征融合 多头注意力机制

科技部科技创新2030重大项目新疆维吾尔自治区重大科技专项新疆维吾尔自治区高校基本科研业务费科研项目

2022ZD01158002022A02011-4XJEDU2022J009

2024

河南农业科学
河南省农业科学院

河南农业科学

CSTPCD北大核心
影响因子:0.787
ISSN:1004-3268
年,卷(期):2024.53(2)
  • 30