面向不平衡图像数据的对抗自编码器过采样算法

扫码查看

原文链接

万方数据
维普

中文摘要：许多适用于低维数据的传统不平衡学习算法在图像数据上的效果并不理想.基于生成对抗网络(GAN)的过采样算法虽然可以生成高质量图像,但在类不平衡情况下容易产生模式崩溃问题.基于自编码器(AE)的过采样算法容易训练,但生成的图像质量较低.为进一步提高过采样算法在不平衡图像中生成样本的质量和训练的稳定性,该文基于生成对抗网络和自编码器的思想提出一种融合自编码器和生成对抗网络的过采样算法(BAEGAN).首先在自编码器中引入一个条件嵌入层,使用预训练的条件自编码器初始化GAN以稳定模型训练;然后改进判别器的输出结构,引入一种融合焦点损失和梯度惩罚的损失函数以减轻类不平衡的影响;最后从潜在向量的分布映射中使用合成少数类过采样技术(SMOTE)来生成高质量的图像.在4个图像数据集上的实验结果表明该算法在生成图像质量和过采样后的分类性能上优于具有辅助分类器的条件生成对抗网络(ACGAN)、平衡生成对抗网络(BAGAN)等过采样算法,能有效解决图像数据中的类不平衡问题.

外文标题：Adversarial Autoencoders Oversampling Algorithm for Imbalanced Image Data

外文摘要：Many traditional imbalanced learning algorithms suitable for low-dimensional data do not perform well on image data.Although the oversampling algorithm based on Generative Adversarial Networks(GAN)can generate high-quality images,it is prone to mode collapse in the case of class imbalance.Oversampling algorithms based on AutoEncoders(AE)are easy to train,but the generated images are of lower quality.In order to improve the quality of samples generated by the oversampling algorithm in imbalanced images and the stability of training,a Balanced oversampling method with AutoEncoders and Generative Adversarial Networks(BAEGAN)is proposed in this paper,which is based on the idea of GAN and AE.First,a conditional embedding layer is introduced in the Autoencoder,and the pre-trained conditional Autoencoder is used to initialize the GAN to stabilize the model training;then the output structure of the discriminator is improved,and a loss function that combines Focal Loss and gradient penalty is proposed to alleviate the impact of class imbalance;and finally the Synthetic Minority Oversampling TEchnique(SMOTE)is used to generate high-quality images from the distribution map of latent vectors.Experimental results on four image data sets show that the proposed algorithm is superior to oversampling methods such as Auxiliary Classifier Generative Adversarial Networks(ACGAN)and BAlancing Generative Adversarial Networks(BAGAN)in terms of image quality and classification performance after oversampling and can effectively solve the class imbalance problem in image data.

外文关键词：

Imbalanced image dataOversamplingGenerative Adversarial Networks(GAN)Adversarial AutoEncoders(AAE)Synthetic Minority Oversampling TEchnique(SMOTE)

作者：

职为梅、常智、卢俊华、耿正乾

展开 >

作者单位：

郑州大学计算机与人工智能学院郑州 450001

关键词：

不平衡图像数据过采样生成对抗网络对抗自编码器合成少数类过采样技术

出版年：

2024

DOI：

10.11999/JEIT240330

电子与信息学报

中国科学院电子学研究所国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心

影响因子：1.302

ISSN：1009-5896

年,卷(期)：2024.46(11)