基于Stacking模型的学术论文多标签分类系统构建
Construction of Multi-Label Classification System for Academic Papers Based on Stacking Model
刘爱琴 1郭少鹏1
作者信息
摘要
学术论文高质量多标签自动分类是推动学术研究发展的关键程序之一.本研究利用Stacking模型将随机森林、支持向量机、极限树、极端梯度提升和神经网络五个分类器融合为一个异质集成分类器,并利用基于问题转换思想的多二分类模型将该分类器应用于学术论文多标签分类.根据学术论文的特点,依次实现了与之配套的论文特征提取模块、TF-IDF加权模块、数据预处理模块,最终构建成一个面向学术论文的多标签分类系统.仿真实验验证了本研究构建的学术论文多标签分类系统在处理学术论文多标签分类问题时,较传统的单模型分类器或同质集成模型分类器在泛化能力、稳定性与准确率方面都有一定程度的提升.图9.参考文献21.
Abstract
High-quality multi-label automatic classification of academic papers is a key step to promote the de-velopment of academic research.In this study,the five classifiers of random forest,support vector machine,limit tree,extreme gradient boosting,and neural network are fused into a heterogeneous ensemble classifier using Stacking model,and the multi-binary classification model based on problem transformation idea is used to apply the classifier to multi-label classification of academic papers.According to the characteristics of academic pa-pers,the supporting paper feature extraction module,TF-IDF weighting module and data pre-processing module are realized in turn,and finally a multi-label classification system for academic papers is constructed.Simulation experiment verifies that the multi-label classification system for academic papers constructed in this study has a certain degree of improvement in generalization ability,stability and accuracy compared with the traditional sin-gle-model classifier or homogeneous ensemble model classifier for the multi-label classification problem of aca-demic papers.9 figs.21 refs.
关键词
论文分类/Stacking模型/多标签分类/多二分类模型Key words
Paper Classification/Stacking Model/Multi-Label Classification/Multi-Binary Classification Model引用本文复制引用
出版年
2024