Random Subspace Sampling for Classification with Missing Data

扫码查看

原文链接

万方数据
维普

外文摘要：Many real-world datasets suffer from the unavoidable issue of missing values,and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors.In this paper,we propose a random subspace sampling method,RSS,by sampling missing items from the corresponding feature histogram distributions in random subspaces,which is effective and efficient at different levels of missing data.Unlike most established approaches,RSS does not train on fixed imputed datasets.Instead,we design a dynamic training strate-gy where the filled values change dynamically by resampling during training.Moreover,thanks to the sampling strategy,we design an ensemble testing strategy where we combine the results of multiple runs of a single model,which is more effi-cient and resource-saving than previous ensemble methods.Finally,we combine these two strategies with the random sub-space method,which makes our estimations more robust and accurate.The effectiveness of the proposed RSS method is well validated by experimental studies.

外文关键词：

missing datarandom subspaceneural networkensemble learning

作者：

曹云浩、吴建鑫

展开 >

作者单位：

State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China

基金：

National Natural Science Foundation of ChinaNational Natural Science Foundation of China

项目编号：

6177225661921006

出版年：

2024

DOI：

10.1007/s11390-023-1611-9

计算机科学技术学报(英文版)

中国计算机学会

计算机科学技术学报(英文版)

CSTPCD

影响因子：0.432

ISSN：1000-9000

年,卷(期)：2024.39(2)