Design of Text Data Feature Extraction and Classification Technology Based on CNN-GRU
In response to the shortcomings of insufficient text feature extraction and low classification accuracy in current Chinese text classification,a keyword extraction model based on E-TF-IDF(Expand Term Frequency Inverse Docu-ment Frequency)and a text classification model based on CNN GRU(Convolutional Neural Networks Gated Recurrent Unit)are proposed.This model can be expanded based on the probability of the occurrence of adjacent keywords,in or-der to achieve better keyword feature extraction.CNN-GRU is more suitable for sequence classification and has fewer parameters,which can reduce the risk of overfitting under small data sets.The final experimental results show that the classification accuracy of CNN-GRU is high,with an average of 97.88%.
Text classificationFeature extractionE-TF-IDFCNN-GRU