首页|基于短文本扩展和特征融合的市民热线文本分类

基于短文本扩展和特征融合的市民热线文本分类

扫码查看
针对市民热线多为短文本和特征稀疏的特点.提出了一种短文本扩展法和基于双通道特征融合的文本分类(BERT-BiGRU-TextCNN,BGTC)模型,实现了对市民热线文本的自动识别与归类.首先使用TF-IWF模型以及LDA主题模型构建核心词库;然后利用Word2Vec计算词语相似度,完成对短文本内容和词向量特征的扩展;最终通过融合BERT-TextCNN和BERT-BiGRU-Attention两个通道特征信息的BGTC模型实现了对扩展后文本的分类.经过多组对比实验,结果表明该方法在市民热线文本分类任务中具有更好的性能,准确率和F1值分别达到了85.6%和85.8%.
Citizen Hotline Text Classification Based on Short Text Extension and Feature Fusion
In response to the characteristics of citizen hotlines being mostly short texts and sparse features,a short text exten-sion method and a text classification model based on dual channel feature fusion(BERT BiGRU TextCNN,BGTC)were proposed to achieve automatic recognition and classification of citizen hotline texts.Firstly,use the TF-IWF model and LDA topic model to con-struct the core vocabulary;Then,Word2Vec is used to calculate word similarity,completing the extension of short text content and word vector features;Finally,the extended text classification was achieved through the BGTC model that integrates the feature infor-mation of BERT TextCNN and BERT-BiGRU Attention channels.After multiple comparative experiments,the results show that this method has better performance in the text classification task of citizen hotlines,with accuracy and F1 values reaching 85.6%and 85.8%,respectively.

citizen hotlineshort text extensiontext classificationfeature fusion

郭小磊、张吴波

展开 >

湖北汽车工业学院电气与信息工程学院,湖北十堰 442002

市民热线 短文本扩展 文本分类 特征融合

湖北省重点研究项目湖北省中央引导地方科技发展专项

TA020022018ZYYD007

2024

山西大同大学学报(自然科学版)
山西大同大学

山西大同大学学报(自然科学版)

影响因子:0.271
ISSN:1674-0874
年,卷(期):2024.40(1)
  • 18