首页|基于CNN-GRU的文本数据特征提取及其分类技术设计

基于CNN-GRU的文本数据特征提取及其分类技术设计

扫码查看
针对当下中文文本分类中存在的文本特征提取不足、分类准确率低等缺点,提出一种基于E-TF-IDF(Expand-Term Frequency-Inverse Document Frequency,E-TF-IDF)的关键词提取模型和 CNN-GRU(Convolutional Neural Networks-Gated Recurrent Unit,CNN-GRU)的文本分类模型.该模型能够根据关键词邻近词语的出现概率中进行拓展,以实现更好的关键词特征提取.CNN-GRU更适用于序列分类且其具有更少的参数,能够减小在小数据集下的过拟合风险.最终的实验结果显示,CNN-GRU的分类精度较高,平均可达 97.88%.
Design of Text Data Feature Extraction and Classification Technology Based on CNN-GRU
In response to the shortcomings of insufficient text feature extraction and low classification accuracy in current Chinese text classification,a keyword extraction model based on E-TF-IDF(Expand Term Frequency Inverse Docu-ment Frequency)and a text classification model based on CNN GRU(Convolutional Neural Networks Gated Recurrent Unit)are proposed.This model can be expanded based on the probability of the occurrence of adjacent keywords,in or-der to achieve better keyword feature extraction.CNN-GRU is more suitable for sequence classification and has fewer parameters,which can reduce the risk of overfitting under small data sets.The final experimental results show that the classification accuracy of CNN-GRU is high,with an average of 97.88%.

Text classificationFeature extractionE-TF-IDFCNN-GRU

苗玉琪

展开 >

合肥幼儿师范高等专科学校社会管理与服务系,安徽合肥 230013

文本分类 特征提取 E-TF-IDF CNN-GRU

安徽省教育厅提质培优行动计划项目

hytyldsr21

2024

贵阳学院学报(自然科学版)
贵阳学院

贵阳学院学报(自然科学版)

影响因子:0.294
ISSN:1673-6125
年,卷(期):2024.19(1)
  • 11