首页|基于类的余弦距离聚类缺失值填补方法研究

基于类的余弦距离聚类缺失值填补方法研究

扫码查看
[目的]为了解决欧氏距离计算相似性带来的高维度问题,提出了基于类的余弦距离聚类缺失值填补方法.[方法]首先将不完整数据集分为两个不同的组(G1和GIM);其次通过聚类中心对GIM组中的缺失数据进行预填补;再次利用余弦距离计算相关性;最后选择与G1组中距离最小的数据来填补缺失值.[结果]实验结果表明,该方法在类别和混合数据集上均优于其他插补方法.[结论]该方法显著提高了准确率、召回率、F1-score及插补效果.
A Study of Missing Value Imputation Methods for Class-based Cosine Distance Clustering
[Purposes]In order to solve the high dimension problem caused by the similarity of Euclidean distance calculation,a class-based cosine distance clustering missing value imputation approach is pro-posed.[Methods]Firstly,the incomplete data set is divided into two different groups(G1 and GIM);sec-ondly,the missing data in the GIM group is pre-filled by the clustering center;the cosine distance is used again to calculate the correlation;finally,the data with the smallest distance from the G1 group is selected to fill the missing values.[Findings]The experimental results show that the proposed method outperforms other imputation methods for both categorical and mixed datasets.[Conclusions]The CBC-IM-COS method significantly improves accuracy,recall and F1-score and imputationperformance.

incomplete datamissing value imputationclusteringcosine distance

夏婷婷、林康、张潇予、刘海忠

展开 >

兰州交通大学,甘肃 兰州 730070

北京师范大学,广东 珠海 519087

香港城市大学社会与行为科学学院,甘肃 兰州 730070

不完整数据 缺失值插补 聚类 余弦距离

2024

河南科技
河南省科学技术信息研究院

河南科技

影响因子:0.615
ISSN:1003-5168
年,卷(期):2024.51(8)