首页|数据科学的科学性与科学问题的分析

数据科学的科学性与科学问题的分析

扫码查看
作为一门新兴的学科领域,数据科学的科学性受到了关注且其科学问题未明确提出.文中从科学研究范式及方法论、可证伪性和可再现性、科学精神及快速迭代以及科学研究纲领及理论体系4个方面探讨了数据科学的"科学性",并解答了为什么数据科学是一门新兴科学的问题.在此基础上,结合DIKW模型(DIKW Pyramid or Hierarchy)、DMP(Data-Model-Problem)模型、数据科学的统计学和机器学习方法论以及数据科学的流程与活动,提出了数据科学的7个核心科学问题:解释在先还是在后或无、问题对齐数据还是数据对齐问题、更加相信数据还是模型、更加重视性能还是可解释性、如何划分数据、如何用已知数据解决未知数据的问题、人在环路还是人出环路.最后,提出了数据科学研究的4点建议:聚焦数据科学本身的理论研究,推动数据的科学、技术和工程需要进一步分离和专业化,加强人工智能赋能的数据科学的理论与实践以及数据科学学科(Data Science as A Discipline)与学科中的数据科学(Data Science Within A Discipline)的联动.
Exploring the Scientific Nature and Scientific Questions of Data Science
As an emerging academic field,the scientific nature of data science has garnered attention,and its scientific questions have not been clearly defined.This paper explores the scientific nature of data science from four aspects:scientific research para-digms and methodologies,falsifiability and reproducibility,scientific spirit and rapid iteration,and scientific research agenda and theoretical framework.It also answers the question of why data science is an emerging science.Building upon this foundation and incorporating concepts such as the DIKW model(data-information-knowledge-wisdom pyramid or hierarchy),the DMP model(da-ta-model-problem model),the statistical and machine learning methodologies of data science,and the processes and activities in data science.This paper presents seven core scientific questions in data science:the precedence of explanation or data,problem alignment with data or data alignment with problems,prioritizing trust in data or models,emphasizing performance or interpre-tability,data partitioning strategies,solving unknown data problems with known data,and the role of humans within or outside the loop.Finally,four recommendations for data science research are proposed:a focus on theoretical research within data science itself,the further separation and specialization of data science in terms of science,technology,and engineering,strengthening the theory and practice of data science empowered by artificial intelligence,and fostering collaboration between the discipline of data science and data science within other disciplines.

Data scienceScientific natureScientific questionsDIKW model

朝乐门

展开 >

数据工程与知识工程教育部重点实验室(中国人民大学)北京 100872

中国人民大学信息资源管理学院 北京 100872

数据科学 科学属性 科学问题 DIKW模型

国家自然科学基金

72074214

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(1)
  • 1