首页|基于包络学习和分级结构一致性机制的不平衡集成算法

基于包络学习和分级结构一致性机制的不平衡集成算法

扫码查看
集成方法是不平衡学习方法的重要分支,然而,现有不平衡集成方法均作用于原样本而没考虑样本的结构信息,因此其效能仍然有限.样本的结构信息包括局部和全局结构信息.为了解决上述问题,本文提出了一种基于深度样本包络网络(Deep Instance Envelope Network,DIEN)和分级结构一致性机制(Hierarchical Structure Consisten-cy Mechanism,HSCM)的不平衡集成学习算法.该算法在考虑局部流形和全局结构信息的情况下,通过多层样本聚类,生成高质量的多层包络样本,从而实现类平衡化.首先,算法基于样本近邻拼接和模糊C均值聚类算法,设计DIEN来挖掘样本的结构信息,得到深度包络样本.然后,设计局部流形结构度量和全局结构分布度量来构建HSCM用于增强层间样本的分布一致性.接着,将DIEN和HSCM结合起来,构建出优化后的深度样本包络网络——DH(DIEN with HSCM).之后,将基分类器应用于包络样本.最后,设计bagging集成学习机制来融合基分类器的预测结果.文末组织了多组实验,采用了十多个公共数据集和有代表性的相关算法进行验证比较.实验结果表明,本文算法在AUC(Area Under Curve),F-measure等四个性能指标上显著最优.
Imbalanced Ensemble Algorithm Based on Envelope Learning and Hierarchical Structure Consistency Mechanism
Ensemble methods have become an important branch of imbalanced learning.However,the existing imbal-anced ensemble methods all rely on the original instances without considering the structure information of the instances,so their effectiveness is still limited.The research shows that the structure information of instances includes local and global structure information.In order to solve the above problem,this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network(DIEN)and hierarchical structure consistency mechanism(HSCM).Considering the local manifold and global structure information,the algorithm generates high-quality deep envelope instances to achieve class bal-ance.Firstly,based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm,the DIEN is de-signed to mine the structure information of instances,obtaining the deep envelope instances.Then,the local manifold struc-ture measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution con-sistency of interlayer instances.Next,DIEN and HSCM are combined to construct the optimized deep instance envelope net-work—DH(DIEN with HSCM).Then,the base classifier is applied to the deep envelope instances.Finally,the bagging en-semble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results.At the end of this paper,several groups of experiments are organized.More than 10 public datasets and representative related algo-rithms are used for verification.Experimental results show that the proposed algorithm is significantly better in four perfor-mance metrics,such as AUC(Area Under Curve)and F-measure.

imbalanced learningenvelope learninghierarchical structure consistency mechanismlocal manifold structure measureglobal structure distribution measure

李帆、张小恒、李勇明、王品

展开 >

重庆大学微电子与通信工程学院,重庆 400030

重庆广播电视大学,重庆 400052

不平衡学习 包络学习 分级结构一致性机制 局部流形结构度量 全局结构分布度量

国家自然科学基金国家自然科学基金中央高校基本科研业务费专项

61771080U21A204482022CDJJJ-003

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(3)
  • 29