Heterogeneity Analysis of High-Dimensional Data with Mixed Types of Covariates
In the era of big data,high-dimensional survey data with mixed types of covariates brings challenges to heterogeneity analysis and its variable selection.This paper proposes a novel sparse clustering method,and discusses its application by taking the China Education Panel Survey and the social survey of"Thousands of People and Hundreds of Villages"as examples.This paper proposes an adjusted DBI criterion to measure the importance of covariates,uses different penalty parameters to control the weights of different types of covariates,and obtains the optimal clustering results and significant covariates.At the theoretical level,this paper demonstrates the variable screening consistency of the proposed method.At the numerical exper-iment level,a series of simulation experiments are designed in this paper to verify the good performance of the proposed method in terms of clustering and variable selection.The results of empirical data also show that the clusters divided by the proposed method have a high degree of discrimination,which is convenient for re-searchers to characterize each group;At the same time,the selected variables have important practical meanings.Without losing information,the dimensionality of the data is reduced,and the interpretability of the model is increased.The sparse clus-tering analysis proposed in this paper realizes the joint analysis of mixed types of covariates in high-dimensional survey data,which greatly improves the utilization rate of information.