首页|计算社会科学时代数据研究新生态构建初探——基于大数据与小数据双向实证检验

计算社会科学时代数据研究新生态构建初探——基于大数据与小数据双向实证检验

扫码查看
计算社会科学时代,大数据越来越多地用于社会研究.由此,引发了学界对小数据(社会调查数据)角色与作用的再审视.文章关注在大数据时代小数据能否依然发挥作用,进而讨论计算社会科学时代的数据生态构建.研究选择同一个社区,基于等价共同体理论,构建大数据、小数据两个等价共同体,独立反映对象的总体特征.选择同一个社区,一方面获取平台大数据,另一方面进入社区开展入户调查(在相同时间)获取调查问卷数据.使用杰卡德(Jaccard)系数计算每两个数据实证分布之间的距离,进行双向实证验证.检验结果显示数据综合匹配度较好.未匹配的变量,也得到比较合理的社会意义解释.这意味着,小数据与大数据都可以用来反映社会总体特征.在计算社会科学时代,应明确大数据、小数据的作用域,对二者进行有机整合,助力构建新的数据研究生态.社会研究者应按照研究问题,选择合适的数据类型(单一或混合),提升知识发现的有效性和稳健性.
Is Social Survey Method Still Reliable in Computational Social Science Era?:A Two-way Empirical Test Based on Small Data and Big Data
In the era of computational social science,big data is increasingly employed in social research,prompting scholars to reconsider the role and function of small data(social survey data).This paper explores whether small data can continue to play a meaningful role in the age of big data,and further discusses the construction of a data ecosystem in the era of computational social science.The study selects a single community and,based on the theory of equivalent communities,constructs two equivalent communities using both big data and small data,independently reflecting the overall characteristics of the object of study.For the same community,one set of data is acquired from platform big data,while another is collected through household surveys(at the same time)to gather questionnaire data.The Jaccard coefficient is used to calculate the distance between the empirical distributions of each pair of datasets,performing a bidirectional empirical validation.The test results show a good overall match between the datasets.Variables that do not match are also given socially meaningful explanations.This suggests that both small data and big data can be used to reflect societal aggregate characteristics.In the era of computational social science,it is important to define the domains of application for big data and small data,achieving an organic integration to assist in building a new data research ecology.Social researchers should select appropriate types of data(single or mixed)according to their research questions,enhancing the effectiveness and robustness of knowledge discovery.

吕鹏

展开 >

新疆大学经济与管理学院

北京大学武汉人工智能研究院 武汉,430075

大数据 社会调查 双向实证检验 数据研究生态

国家社会科学基金重大项目中央引导地方科技发展专项

237ZDA802023EGA035

2024

学海
江苏省社会科学院

学海

CSTPCDCSSCICHSSCD北大核心
影响因子:0.759
ISSN:1001-9790
年,卷(期):2024.(4)