Virtual sample generation method using reduced feature probability density distribution
The modeling data of difficult-to-measure parameters such as industrial process quality indicators and envi-ronmental indicators have characteristics of small samples and sparse distribution.A new virtual sample generation(VSG)method based on probability density distribution(PDF)of reduced features is proposed for modeling data augmentation.Firstly,the principal component analysis(PCA)is used to reduce the feature dimension and the kernel density estimation(KDE)is performed on the obtained independent principal components to generate candidate virtual principal components.By using orthogonally sampling approach,the obtained virtual principal components are used to re-construct the inputs of virtual sample.Then,in order to balance the accuracy and randomness of the mapping model,an ensemble mapping model is constructed by using random forest(RF)and random weight neural network(RWNN)to obtain the outputs of virtual samples.Finally,principal component contribution rate,KDE smoothing index,number of candidate virtual principal com-ponents and virtual samples,mapping model parameters and ensemble weights that affect the quality of virtual samples are selected by comprehensive learning particle swarm optimization(CLPSO)algorithm for obtaining the optimized virtual samples.The experimental results on benchmark dataset and dioxin(DXN)datasets of municipal solid waste incineration process show the rationality and effectiveness of the proposed method.
virtual sample generationprincipal component analysisprobability density distributionkernel density estimationcomprehensive learning particle swarmmixed modeling sample