首页|基于邻域分布的去噪扩散概率模型

基于邻域分布的去噪扩散概率模型

扫码查看
样本有限的表格型数据缺乏不变性结构和足够样本,使得传统数据增强方法和生成式数据增强方法难以获得符合原始数据分布且具有多样性的数据.为此,文中依据表格型数据的特点和邻域风险最小化原则,提出基于邻域分布的去噪扩散概率模型(Vicinal Distribution Based Denoising Diffusion Probabilistic Model,VD-DDPM)及相应算法.首先,分析样本有限表格型数据的特征,通过先验知识选择弱相关特征,并构建样本的邻域分布.然后,利用邻域分布采样数据构建VD-DDPM模型,并使用VD-DDPM数据生成算法生成符合原始数据分布且具有多样性的数据集.在多个数据集上针对数据生成质量、下游模型性能等进行实验,验证VD-DDPM的有效性.
Vicinal Distribution Based Denoising Diffusion Probabilistic Model
Tabular datasets with limited sample size lack invariance structure and enough samples,making traditional generative data augmentation methods difficult to obtain diverse data that conforms to the original data distribution.To address this issue,a vicinal distribution-based denoising diffusion probabilistic model(VD-DDPM)and its learning algorithm based on the characteristics of tabular data and the principle of vicinal risk minimization are proposed.Firstly,features of the tabular data with limited sample size are analyzed.Weakly correlated features are selected via priori knowledge,and the vicinal distribution of the training sample is constructed.Then,the VD-DDPM is built on the data sampled from vicinal distribution.A diverse dataset that conforms to the original data distribution is generated via VD-DDPM generation algorithm.Experiments on multiple datasets verify the effectiveness of the proposed algorithm in terms of the quality of the generated data and the performance of the downstream model.

Data AugmentationVicinal Risk MinimizationVicinal DistributionDiffusion ModelsTabular Data

石洪波、万博闻、张赢

展开 >

山西财经大学信息学院 太原 030031

哈尔滨工程大学计算机科学与技术学院 哈尔滨 150009

数据增强 邻域风险最小化 邻域分布 扩散模型 表格型数据

中央引导地方科技发展资金项目教育部人文社会科学研究项目

YDZJSX20231A05722YJAZH092

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(4)
  • 43