首页|基于乌鸦搜索算法的医疗数据填补方法

基于乌鸦搜索算法的医疗数据填补方法

扫码查看
医疗数据的缺失会导致统计功效降低,进而严重影响诊断的准确性,甚至出现误诊。因此,对医疗问题中的各缺失数据选用有效的填补方法极为重要。为了在医疗数据存在缺失的情况下,对数据进行高效填补以提升医疗数据挖掘效果,本文提出了基于乌鸦搜索算法的医疗数据填补方法。设计了数据填补模型,在此基础上确定了算法个体编码与数据填补模型的映射方式,进而应用乌鸦搜索算法对填补模型进行迭代寻优,最后通过最优的填补模型构建完整医疗数据集。在 4 个医疗数据集上与2 种传统填补方法[均值填补(Mean Imputation,MI)、K最近邻填补(K Nearest Neighbor Imputation,KNNI)]等进行了对比实验,人工构造不同缺失率的数据集,运用各填补方法对缺失数据集进行填补,并将分类算法在填补数据集上的准确率作为填补方法的效果评估指标,结果显示,相较于 MI,所提方法使得分类算法在 4 个数据集上平均提高了 3。7%、3。8%、11。1%和17。7%的准确率;相较于KNNI,所提方法平均提升了分类算法4%、14。8%、12。6%和21。7%的准确率。以上结果表明,本文所提基于乌鸦搜索算法的填补方法能够有效完成缺失数据的填补,提升数据挖掘算法的性能。
An imputation method based on crow search algorithm for medical data
The absence of medical data can lead to a decrease in statistical power,which in turn severely affects the accuracy of diagnoses and may even result in misdiagnoses.Consequently,selecting effective imputation methods for missing data in medical issues is of great importance.In order to efficiently impute missing medical data and enhance the effectiveness of medical data mining,this paper proposes a medical data imputation method based on crow search algorithm(CSA).The data imputation model is designed,and the mapping between the individual encoding of the algorithm and the data imputation model is established.Subsequently,the CSA is applied to iteratively search the optimal imputation model,and the optimal imputation model is used to construct a complete medical dataset.Comparative experiments are conducted on four medical datasets with two traditional imputation methods-mean imputation(MI)and K nearest neighbor imputation(KNNI).Datasets with different missing rates are constructed artificially,and each imputation method is applied to the missing datasets.The accuracy of classification algorithms on the imputed datasets is used as an evaluation metric for the imputation methods.The results show that,compared to MI,the proposed method improves the average accuracy of classification algorithms on the four datasets by 3.7%,3.8%,11.1%,and 17.7%,respectively.Compared to KNNI,the proposed method increases the average accuracy of classification algorithms by 4%,14.8%,12.6%,and 21.7%,respectively.These findings indicate that the imputation method based on the CSA proposed in this paper can effectively complete the imputation of missing data and enhance the performance of data mining algorithms.

Evolutionary algorithmMedical dataData imputationCrow search algorithmData mining

甄珍、刘昱鑫、陈斌、任海萍、刘亚芝

展开 >

中国医疗器械有限公司,北京 100028

国药集团医疗器械研究院有限公司,北京 100028

进化算法 医疗数据 数据填补 乌鸦搜索算法 数据挖掘

2024

现代仪器与医疗
中国科学器材公司

现代仪器与医疗

影响因子:1.47
ISSN:2095-5200
年,卷(期):2024.30(3)