Types of Interviewers and Data Quality Prediction in Social Surveys
This paper discusses how to build a data quality control system based on paradata and data quality prediction model from the perspectives of types of interviewers in social surveys.Building on paradata like GPS and facial recognition,we firstly construct a data quality control system from 3 dimensions—interview length,response burden and interviewers'behavior,and explore the potential factors influencing data quality in a multivariate regression framework.Secondly,the data quality prediction analysis shows that a machine learning model based on data quality indicators and interviewees'characteristics performs well in predicting data quality collected by students but not by community workers.We further show that the sample size of initial training set and the proportion of sample updating training set in the prediction model affect the accuracy of prediction,in which a larger size of the initial training set and sample updating training set is associated with higher prediction consistency.Updating the initial training set improves the accuracy of prediction in the middle stage of surveys.This paper has some implications for interviewer selection,quality control indicators construction,data quality prediction and cost-benefit analysis of quality control in face-to-face surveys.