首页|基于ERNIE-CAB-CNN的稀土专利文本分类模型

基于ERNIE-CAB-CNN的稀土专利文本分类模型

扫码查看
针对稀土专利文本专业性强的特点以及现有的文本分类方法存在的不足,鉴于类别注意力在计算机视觉领域的广泛应用和取得的良好效果,提出了一种用于文本分类的类别注意力模块(Category Attention Module,CAB),并结合预训练模型ERNIE和卷积神经网络(Convolutional Neural Networks,CNN)构建了一个用于稀土专利文本分类的创新模型ERNIE-CAB-CNN.模型使用ERNIE对专利文本进行向量化,得到语义信息更加丰富的向量表示后,通过CAB为文本中各个类别的重要特征赋予较高权值,使模型可以更准确地区分不同类别的特征.最后用CNN进一步提取文本中其他关键局部特征,得到的最终文本向量表示用于分类.通过Patsnap专利数据库官方网站检索下载稀土专利数据构建数据集进行实验,实验结果表明,稀土专利文本分类模型ERNIE-CAB-CNN在测试集上分类的准确率、精确率、F1分数分别为82.68%、83.2%、82.06%,取得了良好的分类效果.
Text classification model of rare earths patents based on ERNE-CAB-CNN
In view of the strong specialization of rare earth patents and the shortcomings of existing classification methods,this paper proposes a Category Attention Block(CAB)for text classification in view of the wide application of category attention in the field of computer vision.Combined with ERNIE and Convolutional Neural Network(CNN),an innovative model ERNE-CAB-CNN for rare earth patent text classification is constructed.The model uses ERNIE to vectorize the patent text,and obtains the vector representation with richer semantic information.Then,it assigns higher weights to the key features of each category in the text through CAB,so that the model can distinguish different types of features more accurately.Finally,CNN is used to further extract other key local features in the text,and the resulting text vector representation is used for classification.Through the offi-cial website of Patsnap patent database,rare earth patent data are retrieved and downloaded to build a dataset for experiments.The experimental results show that the precision rate,accuracy rate and Fl score of the rare earths patent text classification model based on ERNE-CAB-CNN on the test set are 82.68%,83.2%and 82.06%,respectively,achieving a good classification ef-fect.

rare earth patent classificationtext classificationcategory attentionERNIECNNfeature extraction

廖列法、石利娇

展开 >

江西理工大学信息工程学院,江西赣州 341000

稀土专利分类 文本分类 类别注意力 ERNIE CNN 特征提取

2025

电子技术应用
华北计算机系统工程研究所(中国电子信息产业集团有限公司第六研究所)

电子技术应用

影响因子:0.567
ISSN:0258-7998
年,卷(期):2025.51(1)