基于关键词加权的混合特征文本快速分类仿真

Simulation of Fast Text Classification Based on Keyword Weighting

徐佳丽 ¹杨长红²

扫码查看

作者信息

1. 南昌应用技术师范学院电子与信息工程学院,江西南昌 330038
2. 江西科技师范大学数学与计算机科学学院,江西南昌 330038
折叠

摘要

电子文本形式的网络信息不仅数量多,且混合特征具有较高相似性,很难达到特征的平均分布.特征项在类别间的不均性导致文本权重计算易出现偏差,影响类别特征词的提取,导致文本分类难度较大.为此,提出一种基于关键词加权的混合特征文本快速分类方法.采用词频逆文本频率指数信息检索方法对文本加权,计算不同权重下文本关键词在中心集合中出现的频率.根据频率阈值提取关键特征,确定文本集合中类中心点.计算与类中心相关性最高的文本数据,提取关联度特征.建立神经网络分类模型,预先设定一组包含详细特征的文本集,作为初始值输入到神经网络中,每个层次根据目标特征逐一比对实现有效分类.实验证明,所研究方法的查全率更高,文本混合特征提取的召回率高于 40%,说明研究方法应用性能更优,对不同种类的文本集均能完成精准分类.

Abstract

For the network information in the form of electronic text,the mixed features have high similarity,so it is difficult to achieve the average distribution of features.The non-uniformity of feature items among categories leads to the deviation in calculating text weight,affecting the extraction of category feature words and text classification.Therefore,this article presented a fast classification method for the text with hybrid features based on keyword weigh-ting.Firstly,the text was weighted by the information retrieval method based on term frequency-inverse document fre-quency index.Secondly,the frequency of text keywords in the central set was calculated under different weights.Then,key features were extracted according to the frequency threshold.Meanwhile,the final cluster center in the text set was determined.Thirdly,the text data with the highest correlation with the cluster center was calculated,and the correlation feature was extracted.After that,a neural network classification model was built.Moreover,a group of text sets containing detailed features was preset and input into the neural network as initial values.Finally,all levels were compared one by one according to the target features.Thus,effective classification was achieved.Experiment results prove that the recall rate of the method is higher,and the recall rate of mixed feature extraction of text is more than 40%,indicating that the method has better application performance,and can complete accurate classification for dif-ferent kinds of text sets.

关键词

关键词加权/混合特征文本/频率阈值/神经网络分类模型

Key words

Keywords weighting/Mixed feature text/Frequency threshold/Neural network classification model

引用本文复制引用

出版年

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

参考文献量17

段落导航