Mass Imputation Inference of Non-Probability Samples and Probability Samples from the Perspective of Multi-Source Data Fusion
With the development of the society,the nonresponse rates of probability samples are becoming increasingly high,which may lead to missing values in target variables.At the same time,the development of big data and web surveys produces many non-probability samples.Thus,it is a hot issue to combine the two kinds of samples to make inference for population in the field of multi-source data fusion in today's era.Assume that there exist probability samples with completely miss-ing target variables and non-probability samples with complete data.It is proposed that the superpopulation local polynomial model based on non-probability samples is established and used to impute missing values of target variables in probability samples;The imputed probability sample is then adopted to estimate the population.The asymptotic properties of the proposed estimator are further derived.Simulation and empirical studies show that:Compared with the inverse weighted estimator of propensity scores based on non-probability samples,the absolute relative bias,vari-ance and mean square error of the proposed estimator are smaller,and very close to those of the population estimator based on true probability samples;The absolute rel-ative bias of the variance estimator and the coverage rate of 95%confidence intervals for the proposed estimator are also close to those of the population estimator based on the true probability samples,which indicate that the proposed method performs well.
Non-probability sampleprobability samplesuperpopulation local poly-nomial modelmass imputationmulti-source data