统计与信息论坛2024,Vol.39Issue(10) :3-12.

概率-非概率调查样本的整合推断问题研究:核匹配方法

Statistical Inference with Integrated Probability and Non-probability Samples:Kernel Matching Method

王俊 金勇进
统计与信息论坛2024,Vol.39Issue(10) :3-12.

概率-非概率调查样本的整合推断问题研究:核匹配方法

Statistical Inference with Integrated Probability and Non-probability Samples:Kernel Matching Method

王俊 1金勇进2
扫码查看

作者信息

  • 1. 中国社会科学院 人口与劳动经济研究所,北京 100006;中国社会科学院人力资源研究中心,北京 100006
  • 2. 中国人民大学 应用统计科学研究中心,北京 100872
  • 折叠

摘要

基于网络便利调查、大数据平台的数据收集方式,在实践中得到了广泛的发展,但获得的样本本质上均是非概率样本.利用非概率样本推断目标总体特征面临着潜在的偏差,如涵盖偏差、自我选择偏差等.近年来,对融合概率调查和非概率调查的数据资源,以估计有限总体特征问题的讨论较多,但依然存在较多问题.在已有研究的基础上,对非概率样本和概率样本均测量了辅助变量,但只有非概率样本测量了研究变量的背景下,介绍基于倾向得分框架的权数构造方法;在倾向得分核匹配方法的基础上,提出了基于融合概率和非概率样本协变量平衡的核函数带宽选择方法,为非概率样本构造倾向得分核匹配权数.模拟结果显示基于倾向得分核匹配的方法能够显著降低非概率样本的偏差,提出的融合样本协变量平衡的带宽方法能够有效减少估计量的相对偏差、绝对相对偏差和标准差.

Abstract

The data collection methods based on web convenient survey and big data platform have been widely adopted in social science research.However,the obtained samples are essentially non-probability samples.Finite population quantile estimates using those non-probability samples face potential biases,such as coverage bias,self-selection bias.In recent years,there have been considerable discussions on integrating data sources from probability surveys and non-probability surveys to estimate finite population quantiles,however,there are still many issues remain.This paper first introduces methods for constructing weights for non-probability sample based on propensity score frameworks,including inverse propensity score weighting,grouped inverse propensity score weighting and propensity score matching,when only covariates are measured for probability sample,but both covariates and study variables are measured for non-probability sample.Building on propensity score kernel matching methods,a kernel bandwidth selection method is selected,that balances covariates between probability sample and non-probability sample,to construct propensity score kernel matching weights for non-probability sample.Simulation results indicate that weighting non-probability samples using propensity score matching methods provides superior performance compared to inverse propensity score weighting,particularly when the propensity score model is estimated using unweighted logistic regression.Although inverse propensity score weighting can mitigate biases in non-probability samples when propensity score model is estimated by weighted logistic regression,it tends to have a larger standard error and a lower coverage rate compared to propensity score matching methods.However,inverse propensity score weighting method can not reduce bias of non-probability sample when propensity score model is estimated by unweighted logistic regression For k-nearest neighbors(kNN)propensity score matching estimators,both the standard error and coverage rate are adversely affected when k is small.In contrast,kernel matching methods based on propensity scores substantially reduce biases and standard errors while improving the coverage rate for non-probability samples.Furthermore,the kernel bandwidth selection method proposed effectively decreases both relative and absolute biases as well as the standard error of the estimates.

关键词

非概率样本/融合数据/带宽选择/统计推断/核匹配

Key words

non-probability sample/integrated sample/bandwidth selection/statistical inference/kernel matching

引用本文复制引用

基金项目

国家社会科学基金项目(19BTJ012)

中国社会科学院重大经济社会调查项目(GQDC2023022)

中国社会科学院研究所实验室综合资助项目(2024SYZH008)

全国统计科学研究项目(2023LY093)

出版年

2024
统计与信息论坛
西安财经学院,中国统计教育学会高教分会

统计与信息论坛

CSTPCDCSSCICHSSCD北大核心
影响因子:0.857
ISSN:1007-3116
参考文献量7
段落导航相关论文