河南科技2024,Vol.51Issue(8) :28-35.DOI:10.19968/j.cnki.hnkj.1003-5168.2024.08.006

基于类的余弦距离聚类缺失值填补方法研究

A Study of Missing Value Imputation Methods for Class-based Cosine Distance Clustering

夏婷婷 林康 张潇予 刘海忠
河南科技2024,Vol.51Issue(8) :28-35.DOI:10.19968/j.cnki.hnkj.1003-5168.2024.08.006

基于类的余弦距离聚类缺失值填补方法研究

A Study of Missing Value Imputation Methods for Class-based Cosine Distance Clustering

夏婷婷 1林康 2张潇予 3刘海忠1
扫码查看

作者信息

  • 1. 兰州交通大学,甘肃 兰州 730070
  • 2. 北京师范大学,广东 珠海 519087
  • 3. 香港城市大学社会与行为科学学院,甘肃 兰州 730070
  • 折叠

摘要

[目的]为了解决欧氏距离计算相似性带来的高维度问题,提出了基于类的余弦距离聚类缺失值填补方法.[方法]首先将不完整数据集分为两个不同的组(G1和GIM);其次通过聚类中心对GIM组中的缺失数据进行预填补;再次利用余弦距离计算相关性;最后选择与G1组中距离最小的数据来填补缺失值.[结果]实验结果表明,该方法在类别和混合数据集上均优于其他插补方法.[结论]该方法显著提高了准确率、召回率、F1-score及插补效果.

Abstract

[Purposes]In order to solve the high dimension problem caused by the similarity of Euclidean distance calculation,a class-based cosine distance clustering missing value imputation approach is pro-posed.[Methods]Firstly,the incomplete data set is divided into two different groups(G1 and GIM);sec-ondly,the missing data in the GIM group is pre-filled by the clustering center;the cosine distance is used again to calculate the correlation;finally,the data with the smallest distance from the G1 group is selected to fill the missing values.[Findings]The experimental results show that the proposed method outperforms other imputation methods for both categorical and mixed datasets.[Conclusions]The CBC-IM-COS method significantly improves accuracy,recall and F1-score and imputationperformance.

关键词

不完整数据/缺失值插补/聚类/余弦距离

Key words

incomplete data/missing value imputation/clustering/cosine distance

引用本文复制引用

出版年

2024
河南科技
河南省科学技术信息研究院

河南科技

影响因子:0.615
ISSN:1003-5168
段落导航相关论文