首页|概率-非概率调查样本的整合推断问题研究:核匹配方法

概率-非概率调查样本的整合推断问题研究:核匹配方法

扫码查看
基于网络便利调查、大数据平台的数据收集方式,在实践中得到了广泛的发展,但获得的样本本质上均是非概率样本.利用非概率样本推断目标总体特征面临着潜在的偏差,如涵盖偏差、自我选择偏差等.近年来,对融合概率调查和非概率调查的数据资源,以估计有限总体特征问题的讨论较多,但依然存在较多问题.在已有研究的基础上,对非概率样本和概率样本均测量了辅助变量,但只有非概率样本测量了研究变量的背景下,介绍基于倾向得分框架的权数构造方法;在倾向得分核匹配方法的基础上,提出了基于融合概率和非概率样本协变量平衡的核函数带宽选择方法,为非概率样本构造倾向得分核匹配权数.模拟结果显示基于倾向得分核匹配的方法能够显著降低非概率样本的偏差,提出的融合样本协变量平衡的带宽方法能够有效减少估计量的相对偏差、绝对相对偏差和标准差.
Statistical Inference with Integrated Probability and Non-probability Samples:Kernel Matching Method
The data collection methods based on web convenient survey and big data platform have been widely adopted in social science research.However,the obtained samples are essentially non-probability samples.Finite population quantile estimates using those non-probability samples face potential biases,such as coverage bias,self-selection bias.In recent years,there have been considerable discussions on integrating data sources from probability surveys and non-probability surveys to estimate finite population quantiles,however,there are still many issues remain.This paper first introduces methods for constructing weights for non-probability sample based on propensity score frameworks,including inverse propensity score weighting,grouped inverse propensity score weighting and propensity score matching,when only covariates are measured for probability sample,but both covariates and study variables are measured for non-probability sample.Building on propensity score kernel matching methods,a kernel bandwidth selection method is selected,that balances covariates between probability sample and non-probability sample,to construct propensity score kernel matching weights for non-probability sample.Simulation results indicate that weighting non-probability samples using propensity score matching methods provides superior performance compared to inverse propensity score weighting,particularly when the propensity score model is estimated using unweighted logistic regression.Although inverse propensity score weighting can mitigate biases in non-probability samples when propensity score model is estimated by weighted logistic regression,it tends to have a larger standard error and a lower coverage rate compared to propensity score matching methods.However,inverse propensity score weighting method can not reduce bias of non-probability sample when propensity score model is estimated by unweighted logistic regression For k-nearest neighbors(kNN)propensity score matching estimators,both the standard error and coverage rate are adversely affected when k is small.In contrast,kernel matching methods based on propensity scores substantially reduce biases and standard errors while improving the coverage rate for non-probability samples.Furthermore,the kernel bandwidth selection method proposed effectively decreases both relative and absolute biases as well as the standard error of the estimates.

non-probability sampleintegrated samplebandwidth selectionstatistical inferencekernel matching

王俊、金勇进

展开 >

中国社会科学院 人口与劳动经济研究所,北京 100006

中国社会科学院人力资源研究中心,北京 100006

中国人民大学 应用统计科学研究中心,北京 100872

非概率样本 融合数据 带宽选择 统计推断 核匹配

国家社会科学基金项目中国社会科学院重大经济社会调查项目中国社会科学院研究所实验室综合资助项目全国统计科学研究项目

19BTJ012GQDC20230222024SYZH0082023LY093

2024

统计与信息论坛
西安财经学院,中国统计教育学会高教分会

统计与信息论坛

CSTPCDCSSCICHSSCD北大核心
影响因子:0.857
ISSN:1007-3116
年,卷(期):2024.39(10)
  • 7