A Study of Missing Value Imputation Methods for Class-based Cosine Distance Clustering
[Purposes]In order to solve the high dimension problem caused by the similarity of Euclidean distance calculation,a class-based cosine distance clustering missing value imputation approach is pro-posed.[Methods]Firstly,the incomplete data set is divided into two different groups(G1 and GIM);sec-ondly,the missing data in the GIM group is pre-filled by the clustering center;the cosine distance is used again to calculate the correlation;finally,the data with the smallest distance from the G1 group is selected to fill the missing values.[Findings]The experimental results show that the proposed method outperforms other imputation methods for both categorical and mixed datasets.[Conclusions]The CBC-IM-COS method significantly improves accuracy,recall and F1-score and imputationperformance.
incomplete datamissing value imputationclusteringcosine distance