The modeling data of difficult-to-measure parameters such as industrial process quality indicators and envi-ronmental indicators have characteristics of small samples and sparse distribution.A new virtual sample generation(VSG)method based on probability density distribution(PDF)of reduced features is proposed for modeling data augmentation.Firstly,the principal component analysis(PCA)is used to reduce the feature dimension and the kernel density estimation(KDE)is performed on the obtained independent principal components to generate candidate virtual principal components.By using orthogonally sampling approach,the obtained virtual principal components are used to re-construct the inputs of virtual sample.Then,in order to balance the accuracy and randomness of the mapping model,an ensemble mapping model is constructed by using random forest(RF)and random weight neural network(RWNN)to obtain the outputs of virtual samples.Finally,principal component contribution rate,KDE smoothing index,number of candidate virtual principal com-ponents and virtual samples,mapping model parameters and ensemble weights that affect the quality of virtual samples are selected by comprehensive learning particle swarm optimization(CLPSO)algorithm for obtaining the optimized virtual samples.The experimental results on benchmark dataset and dioxin(DXN)datasets of municipal solid waste incineration process show the rationality and effectiveness of the proposed method.
关键词
虚拟样本生成/主成分分析/概率密度分布/核密度估计/综合学习粒子群/混合建模样本
Key words
virtual sample generation/principal component analysis/probability density distribution/kernel density estimation/comprehensive learning particle swarm/mixed modeling sample