首页|基于ERNIE及改进DPCNN的棉花病虫害问句意图识别

基于ERNIE及改进DPCNN的棉花病虫害问句意图识别

扫码查看
针对目前没有公开的棉花病虫害相关问句数据集且问句较短、类型多样等问题,本研究通过查阅文献及咨询相关领域专家,构建了棉花病虫害问句数据集CQCls,定义了 78 种棉花病虫害实体和 9 种问句类型;同时提出了一种基于ERNIE预训练模型的棉花病虫害问句意图识别模型,首先通过ERNIE模型将输入问句映射到向量空间,使用融合词位置信息的DPCNN模型进行特征向量的抽取,与基础的DPCNN模型相比,通过融合词位置信息能有效提高模型的表达能力,然后经过Softmax得到最终结果。实验结果表明,本研究提出的意图识别模型相较于其他模型取得了较好的结果,宏平均和加权平均的F1 分数值分别为 97。45%和97。31%;在文本语料数据内容复杂多样且文本格式不规范的DMSCD数据集上,训练结果中不同类别的F1 分数的权重平均也能达到 73。42%,进一步证明了该模型的有效性及泛化能力。
Intentional Recognition of Cotton Disease and Pest Questions Based on ERNIE and Improved DPCNN
Aiming at the problems that there is no publicly available question data set related to cotton pests and diseases,and the cotton pest and disease questions are short in length and various in type,the CQ-Cls data set of cotton pest and disease questions was established containing 78 species of disease and pest enti-ties and 9 types of questions.An intention recognition model of cotton disease and pest questions based on the ERNIE pre-training model was proposed.Firstly,the input questions were mapped into the vector space through the ERNIE model;secondly,the feature vector was extracted using the DPCNN model that fused word location information,which could effectively improve the expression ability compared with the basic DPCNN model;and then the final results could be obtained through Softmax.The test results showed that the intention recognition model proposed in this study achieved better results compared to other models,with the values of 97.45%and 97.31%for macro average and weighted average F1 score,respectively.On the DMSCD data set with complex and diverse text corpora and non-standard text formats,the average weight of F1 scores for differ-ent categories in the training results could also reach 73.42%,further proving the effectiveness and generaliza-tion ability of the model proposed in this paper.

Cotton pests and diseasesIntention recognition of questionsERNIE modelDPCNN mod-elWord location information

李东亚、白涛、香慧敏、戴硕、王震鲁

展开 >

新疆农业大学计算机与信息工程学院,新疆乌鲁木齐 830052

智能农业教育部工程研究中心,新疆乌鲁木齐 830052

新疆农业信息化工程技术研究中心,新疆乌鲁木齐 830052

新疆科信职业技术学院,新疆乌鲁木齐 830049

展开 >

棉花病虫害 问句意图识别 ERNIE模型 DPCNN模型 词位置信息

科技部科技创新2030重大项目新疆维吾尔自治区重大科技专项新疆维吾尔自治区高校基本科研业务费科研项目

2022ZD01158002022A02011-4XJEDU2022J009

2024

山东农业科学
山东省农业科学院,山东农学会,山东农业大学

山东农业科学

CSTPCD北大核心
影响因子:0.578
ISSN:1001-4942
年,卷(期):2024.56(6)