基于CNN-GRU的文本数据特征提取及其分类技术设计

扫码查看

原文链接

万方数据
维普

中文摘要：针对当下中文文本分类中存在的文本特征提取不足、分类准确率低等缺点,提出一种基于E-TF-IDF(Expand-Term Frequency-Inverse Document Frequency,E-TF-IDF)的关键词提取模型和 CNN-GRU(Convolutional Neural Networks-Gated Recurrent Unit,CNN-GRU)的文本分类模型.该模型能够根据关键词邻近词语的出现概率中进行拓展,以实现更好的关键词特征提取.CNN-GRU更适用于序列分类且其具有更少的参数,能够减小在小数据集下的过拟合风险.最终的实验结果显示,CNN-GRU的分类精度较高,平均可达 97.88％.

外文标题：Design of Text Data Feature Extraction and Classification Technology Based on CNN-GRU

外文摘要：In response to the shortcomings of insufficient text feature extraction and low classification accuracy in current Chinese text classification,a keyword extraction model based on E-TF-IDF(Expand Term Frequency Inverse Docu-ment Frequency)and a text classification model based on CNN GRU(Convolutional Neural Networks Gated Recurrent Unit)are proposed.This model can be expanded based on the probability of the occurrence of adjacent keywords,in or-der to achieve better keyword feature extraction.CNN-GRU is more suitable for sequence classification and has fewer parameters,which can reduce the risk of overfitting under small data sets.The final experimental results show that the classification accuracy of CNN-GRU is high,with an average of 97.88％.

外文关键词：

Text classificationFeature extractionE-TF-IDFCNN-GRU

作者：

苗玉琪

展开 >

作者单位：

合肥幼儿师范高等专科学校社会管理与服务系,安徽合肥 230013

关键词：

文本分类特征提取 E-TF-IDF CNN-GRU

基金：

安徽省教育厅提质培优行动计划项目

项目编号：

hytyldsr21

出版年：

2024

贵阳学院学报(自然科学版)

贵阳学院

贵阳学院学报(自然科学版)

影响因子：0.294

ISSN：1673-6125

年,卷(期)：2024.19(1)

参考文献量11