Research on Short Text Classification Method Based on BERTopic Topic Modeling and RoBERTa Algorithm
[Purpose/Significance]To address the sparsity issue in short text classification,this paper proposes a short text classification method based on topic probabilistic feature expansion with BERTopic-RoBERTa-PCA-CatBoost model.[Methods/Processes]The RoBERTa model is employed to obtain word vector representations of short texts.Topic probabilistic feature vectors are extracted using BERTopic topic model,which is then fused with word vectors for feature expansion.Finally,the CatBoost algorithm is utilized for classification.[Limitations]In terms of classification,deep learning algorithms have not been utilized for verification.Regarding feature fusion,future work may consider alternative feature fusion methods.[Results/Conclusions]The proposed BERTopic-RoBERTa-PCA-CatBoost model demonstrates improvements of 10.90%in accuracy,10.91%in precision,and 10.68%in recall compared to LDA-CatBoost model.The short text classification method based on topic probabilistic feature expansion can overcome the limitations of individual models and enhance the effectiveness of short text classification.
Short Textbook ClassificationWord VectorBERTopic ModelRoBERTa Model