贵阳学院学报(自然科学版)2024,Vol.19Issue(1) :32-35,41.

基于CNN-GRU的文本数据特征提取及其分类技术设计

Design of Text Data Feature Extraction and Classification Technology Based on CNN-GRU

苗玉琪
贵阳学院学报(自然科学版)2024,Vol.19Issue(1) :32-35,41.

基于CNN-GRU的文本数据特征提取及其分类技术设计

Design of Text Data Feature Extraction and Classification Technology Based on CNN-GRU

苗玉琪1
扫码查看

作者信息

  • 1. 合肥幼儿师范高等专科学校社会管理与服务系,安徽合肥 230013
  • 折叠

摘要

针对当下中文文本分类中存在的文本特征提取不足、分类准确率低等缺点,提出一种基于E-TF-IDF(Expand-Term Frequency-Inverse Document Frequency,E-TF-IDF)的关键词提取模型和 CNN-GRU(Convolutional Neural Networks-Gated Recurrent Unit,CNN-GRU)的文本分类模型.该模型能够根据关键词邻近词语的出现概率中进行拓展,以实现更好的关键词特征提取.CNN-GRU更适用于序列分类且其具有更少的参数,能够减小在小数据集下的过拟合风险.最终的实验结果显示,CNN-GRU的分类精度较高,平均可达 97.88%.

Abstract

In response to the shortcomings of insufficient text feature extraction and low classification accuracy in current Chinese text classification,a keyword extraction model based on E-TF-IDF(Expand Term Frequency Inverse Docu-ment Frequency)and a text classification model based on CNN GRU(Convolutional Neural Networks Gated Recurrent Unit)are proposed.This model can be expanded based on the probability of the occurrence of adjacent keywords,in or-der to achieve better keyword feature extraction.CNN-GRU is more suitable for sequence classification and has fewer parameters,which can reduce the risk of overfitting under small data sets.The final experimental results show that the classification accuracy of CNN-GRU is high,with an average of 97.88%.

关键词

文本分类/特征提取/E-TF-IDF/CNN-GRU

Key words

Text classification/Feature extraction/E-TF-IDF/CNN-GRU

引用本文复制引用

基金项目

安徽省教育厅提质培优行动计划项目(hytyldsr21)

出版年

2024
贵阳学院学报(自然科学版)
贵阳学院

贵阳学院学报(自然科学版)

影响因子:0.294
ISSN:1673-6125
参考文献量11
段落导航相关论文