首页|基于改进随机森林的哮喘病缺失值填充方法

基于改进随机森林的哮喘病缺失值填充方法

扫码查看
哮喘病数据中包含大量缺失值使得哮喘病难以精确预测。将现有的随机森林算法应用于填补哮喘病缺失数据时,在预填充环节忽略了医疗特征之间的相关性,在数据填充过程中未能及时更新数据,使得数据无法到最新状态。针对上述问题,提出一种改进的随机森林算法,预填充阶段,利用Pearson相关性分析构造填充更准确的回归方程,改变随机森林算法中的预填充方法,构造预填充矩阵提高算法填充效率,填充阶段,利用随机森林算法逐列填充特性,加入基于局部数据的循环更新机制,一列填充完成时便更新回归方程参数,进一步更新预填充矩阵中所有参数,保证数据的同步性。实验表明,改进的随机森林算法相比其他算法填充效果更好,能有效地提高哮喘病诊断精度。
A Method for Filling Missing Values of Asthma Based on Improved Random Forest
Asthma data contains a large number of missing values,making it difficult to accurately predict asthma.When the existing random forest algorithm is applied to fill in the missing data of asthma,the correlation between medical characteristics is ig-nored in the pre-filling process,and the data cannot be updated in time during the data filling process,so that the data cannot be up-to-date.Aiming at the above problems,an improved random forest algorithm is proposed.In the pre-filling stage,Pearson corre-lation analysis is used to construct a more accurate regression equation,change the pre-filling method in the random forest algo-rithm,and construct a pre-filling matrix to improve the filling efficiency of the algorithm.In the first stage,the random forest algo-rithm is used to fill columns by column,and a cyclic update mechanism based on local data is added.When a column is filled,the parameters of the regression equation are updated,and all parameters in the pre-filled matrix are further updated to ensure data syn-chronization.Experiments show that the improved random forest algorithm has better filling effect than other algorithms,and can ef-fectively improve the accuracy of asthma diagnosis.

asthma diseaserandom forest algorithmmissing value completionprefilled matrixcyclic update mechanism

巩凤杰、周从华

展开 >

江苏大学计算机科学与通信工程学院 镇江 212013

江苏大学京口区新一代信息技术产业研究院 镇江 212013

哮喘病 随机森林算法 缺失值处理 预填充矩阵 循环更新机制

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(8)