首页|融合数据增强与集成学习的IT运维数据分类方法

融合数据增强与集成学习的IT运维数据分类方法

扫码查看
智能运维的飞速发展对IT运维数据的自动化分类产生了巨大的需求,基于深度学习的文本分类方法取得了比传统机器学习方法更好的效果。然而,对于不平衡数据集的文本分类仍然面临挑战,且单一的神经网络模型无法提取并综合文本中多维度的信息。鉴于此,论文提出了一种融合数据增强与集成学习的IT运维数据分类方法。该方法提出了一种基于TF-IDF关键词提取算法的文本数据增强方法,并通过将少样本类别进行文本数据增强得到相对平衡的训练数据集,此外以TextCNN、TextRCNN和FastText作为基分类器,分别进行训练和预测,将所得概率以软投票法为结合策略进行集成,得到IT运维数据分类模型。理论分析以及实验结果表明,与传统分类方法相比,该分类方法有效解决了数据不平衡问题,取得了良好的分类效果。
IT Operation and Maintenance Data Classification Method Integrating Data Augmentation and Ensamble Learning
The rapid development of artificial intelligence for IT operations generated a huge demand for the automatic classifi-cation of IT operation and maintenance data.The text classification method based on deep learning has achieved better results than the traditional machine learning method.However,the text classification of unbalanced data sets still faces challenges,a single neu-ral network model can not extract and synthesize the multi-dimensional information in the text.In view of this,the paper proposes an IT operation and maintenance data classification method integrated data enhancement and ensamble learning.This method pro-poses a text data augmentation method based on TF-IDF keyword extraction algorithm,and a relatively balanced training data set is obtained by text data enhancement of small sample categories.For the more,TextCNN,TextRCNN and FastText are used as the base classifiers for training and prediction respectively.The obtained probability is integrated by the softvoting method to obtain the IT operation and maintenance data classification model.Theoretical analysis and experimental results show that compared with tradi-tional classification methods,this classification method effectively solves the problem of data imbalance and achieves better classifi-cation results.

text classificationartificial intelligence for IT operationsdata augmentationdeep learningensamble learning

刘鑫泉、徐建

展开 >

南京理工大学计算机科学与工程学院 南京 210094

文本分类 智能运维 数据增强 深度学习 集成学习

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(12)