计算机工程与设计2024,Vol.45Issue(7) :2074-2081.DOI:10.16208/j.issn1000-7024.2024.07.021

融合概率类别特征增强的短文本分类

Short text classification enhanced with probabilistic category features

廖列法 李奎 姚秀
计算机工程与设计2024,Vol.45Issue(7) :2074-2081.DOI:10.16208/j.issn1000-7024.2024.07.021

融合概率类别特征增强的短文本分类

Short text classification enhanced with probabilistic category features

廖列法 1李奎 2姚秀2
扫码查看

作者信息

  • 1. 江西理工大学信息工程学院,江西赣州 341000;江西现代职业技术学院院长办公室,江西南昌 330095
  • 2. 江西理工大学信息工程学院,江西赣州 341000
  • 折叠

摘要

对短文本所含信息量缺乏而导致分类准确度难以提升的问题进行研究,提出一种融合概率类别特征增强的短文本分类网络模型FT_BDCNN.将N-gram处理后产生的N元词典通过TF-IDF分离出具有概率类别区分度的特征信息(FT模块);将向量化表示后的文本信息输入到改进后的特征提取模块中;将两个模块的输出进行特征融合,完成文本分类.实验结果表明,所提模型在THUCNews数据集上的Fl值达到91.91%.FT模块可以与现有分类模型进行融合,提升模型的分类性能.

Abstract

The problem of difficulty in improving classification accuracy due to the lack of information contained in short text was studied,and a short text classification network model,FT_BDCNN,based on the enhancement of probability category features,was proposed.The N-gram was processed to generate an N-element dictionary,and then TF-IDF was used to separate out the feature information with probability category discrimination(FT module).The text information after vectorization representation was input into the improved feature extraction module.The outputs of the two modules were fused with features to complete the text classification.Experimental results show that the F1 value of the proposed model on the THUCNews dataset reaches 91.91%.The FT module can be integrated with existing classification models to improve the classification performance of the model.

关键词

类别特征增强/短文本/双池化/特征融合/统计算法/快速分类/深度学习

Key words

category feature enhancement/short text/double pooling/feature fusion/statistical algorithms/quick classification/deep learning

引用本文复制引用

基金项目

国家自然科学基金项目(71462018)

国家自然科学基金项目(71761018)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
段落导航相关论文