首页|多源数据融合视角下非概率样本与概率样本的大量插补推断方法

多源数据融合视角下非概率样本与概率样本的大量插补推断方法

扫码查看
随着社会的发展,概率样本无回答率越来越高,其目标变量可能存在缺失的情况.同时,大数据与网络调查的发展使得获得的样本大多数是非概率样本,如何结合这两种样本推断总体是当今时代多源数据融合领域的一个热点问题.假设存在目标变量完全缺失的概率样本和数据完整的非概率样本,提出基于非概率样本建立超总体局部多项式模型,插补概率样本缺失的目标变量,并利用插补后的概率样本估计总体,进一步证明提出估计的渐近性质.模拟和实证研究表明:与基于非概率样本的倾向得分逆加权估计相比,提出估计的绝对相对偏差,方差与均方误差更小,且与基于真实概率样本的总体估计相接近;提出总体均值估计的方差估计的绝对相对偏差与95%置信区间覆盖率也接近于基于真实概率样本的总体估计的相应指标,估计效果较好.
Mass Imputation Inference of Non-Probability Samples and Probability Samples from the Perspective of Multi-Source Data Fusion
With the development of the society,the nonresponse rates of probability samples are becoming increasingly high,which may lead to missing values in target variables.At the same time,the development of big data and web surveys produces many non-probability samples.Thus,it is a hot issue to combine the two kinds of samples to make inference for population in the field of multi-source data fusion in today's era.Assume that there exist probability samples with completely miss-ing target variables and non-probability samples with complete data.It is proposed that the superpopulation local polynomial model based on non-probability samples is established and used to impute missing values of target variables in probability samples;The imputed probability sample is then adopted to estimate the population.The asymptotic properties of the proposed estimator are further derived.Simulation and empirical studies show that:Compared with the inverse weighted estimator of propensity scores based on non-probability samples,the absolute relative bias,vari-ance and mean square error of the proposed estimator are smaller,and very close to those of the population estimator based on true probability samples;The absolute rel-ative bias of the variance estimator and the coverage rate of 95%confidence intervals for the proposed estimator are also close to those of the population estimator based on the true probability samples,which indicate that the proposed method performs well.

Non-probability sampleprobability samplesuperpopulation local poly-nomial modelmass imputationmulti-source data

刘展、周青、王林、潘莹丽

展开 >

湖北大学数学与统计学学院应用数学湖北省重点实验室,武汉 430062

非概率样本 概率样本 超总体局部多项式模型 大量插补 多源数据

国家社会科学基金

21XTJ006

2024

系统科学与数学
中国科学院数学与系统科学研究院

系统科学与数学

CSTPCD北大核心
影响因子:0.425
ISSN:1000-0577
年,卷(期):2024.44(2)
  • 29