Statistical Inference with Integrated Probability and Non-probability Samples:Kernel Matching Method
The data collection methods based on web convenient survey and big data platform have been widely adopted in social science research.However,the obtained samples are essentially non-probability samples.Finite population quantile estimates using those non-probability samples face potential biases,such as coverage bias,self-selection bias.In recent years,there have been considerable discussions on integrating data sources from probability surveys and non-probability surveys to estimate finite population quantiles,however,there are still many issues remain.This paper first introduces methods for constructing weights for non-probability sample based on propensity score frameworks,including inverse propensity score weighting,grouped inverse propensity score weighting and propensity score matching,when only covariates are measured for probability sample,but both covariates and study variables are measured for non-probability sample.Building on propensity score kernel matching methods,a kernel bandwidth selection method is selected,that balances covariates between probability sample and non-probability sample,to construct propensity score kernel matching weights for non-probability sample.Simulation results indicate that weighting non-probability samples using propensity score matching methods provides superior performance compared to inverse propensity score weighting,particularly when the propensity score model is estimated using unweighted logistic regression.Although inverse propensity score weighting can mitigate biases in non-probability samples when propensity score model is estimated by weighted logistic regression,it tends to have a larger standard error and a lower coverage rate compared to propensity score matching methods.However,inverse propensity score weighting method can not reduce bias of non-probability sample when propensity score model is estimated by unweighted logistic regression For k-nearest neighbors(kNN)propensity score matching estimators,both the standard error and coverage rate are adversely affected when k is small.In contrast,kernel matching methods based on propensity scores substantially reduce biases and standard errors while improving the coverage rate for non-probability samples.Furthermore,the kernel bandwidth selection method proposed effectively decreases both relative and absolute biases as well as the standard error of the estimates.