首页|基于约简特征概率密度分布的虚拟样本生成

基于约简特征概率密度分布的虚拟样本生成

扫码查看
复杂工业过程的产品质量和环保指标等难测参数的建模数据具有样本小、分布稀疏等特性。对此,本文提出了基于约简特征概率密度分布(PDF)的虚拟样本生成(VSG)方法进行建模数据扩充。首先,采用主成分分析(PCA)对小样本数据进行特征约简,并对所得独立主成分进行核密度估计(KDE)以生成候选虚拟主成分,再正交采样后通过重构获得虚拟样本输入。接着,为均衡映射模型的精度与随机性,采用随机森林(RF)与随机权神经网络(RWNN)构建集成映射模型获得虚拟样本输出。最后,对影响虚拟样本"优劣"的主成分贡献率、KDE平滑指数、候选虚拟主成分、虚拟样本数量、映射模型学习参数及集成权重等参数,采用综合学习粒子群优化(CLPSO)算法进行优化以获得最优虚拟样本。通过基准数据集和城市固废焚烧过程二噁英(DXN)数据集验证了所提VSG方法的合理性及有效性。
Virtual sample generation method using reduced feature probability density distribution
The modeling data of difficult-to-measure parameters such as industrial process quality indicators and envi-ronmental indicators have characteristics of small samples and sparse distribution.A new virtual sample generation(VSG)method based on probability density distribution(PDF)of reduced features is proposed for modeling data augmentation.Firstly,the principal component analysis(PCA)is used to reduce the feature dimension and the kernel density estimation(KDE)is performed on the obtained independent principal components to generate candidate virtual principal components.By using orthogonally sampling approach,the obtained virtual principal components are used to re-construct the inputs of virtual sample.Then,in order to balance the accuracy and randomness of the mapping model,an ensemble mapping model is constructed by using random forest(RF)and random weight neural network(RWNN)to obtain the outputs of virtual samples.Finally,principal component contribution rate,KDE smoothing index,number of candidate virtual principal com-ponents and virtual samples,mapping model parameters and ensemble weights that affect the quality of virtual samples are selected by comprehensive learning particle swarm optimization(CLPSO)algorithm for obtaining the optimized virtual samples.The experimental results on benchmark dataset and dioxin(DXN)datasets of municipal solid waste incineration process show the rationality and effectiveness of the proposed method.

virtual sample generationprincipal component analysisprobability density distributionkernel density estimationcomprehensive learning particle swarmmixed modeling sample

汤健、崔璨麟、王丹丹、乔俊飞

展开 >

北京工业大学信息学部,北京 100124

智慧环保北京实验室,北京 100124

虚拟样本生成 主成分分析 概率密度分布 核密度估计 综合学习粒子群 混合建模样本

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(11)