作为生成式人工智能核心驱动力的训练数据的治理研究

扫码查看

原文链接

万方数据

中文摘要：[目的/意义]当前研究对于生成式人工智能训练数据的治理问题关注较少.然而,训练数据的生命周期中存在着诸多不容忽视的风险,亟需有效的治理.[方法/过程]文章在论证了训练数据是生成式人工智能核心驱动力的基础上,使用数据生命周期的理论模型,全面归纳了训练数据生命周期中可能出现的风险样态.然后,从训练数据自身特性、生态性因素与生成式人工智能开发者操作性因素等角度分析了相关风险的成因.[结果/结论]数据本身的碎片化特性与偏见性是风险发生的起点;数据的生态失衡是风险发生的外部成因;同时,"黑箱"中的训练数据、偏差的数据标注与懈怠的数据脱敏则是风险发生的内在成因.由此,针对训练数据的特性,可以借助"可怜圆点"的框架,为其构建一个综合法律、市场、社群规范以及架构的风险治理方案.

外文标题：Research on the Governance of Training Data as the Core Driving Force of Generative Artificial Intelligence

外文摘要：[Purpose/significance]In current research,there is less focus on the governance issues of training data for generative artificial intelligence.However,there are numerous risks in the lifecycle of training data that cannot be ig-nored and urgently need effective governance.[Methods/process]This paper,based on the demonstration that training data is the core driving force of generative artificial intelligence,uses the theoretical model of the data lifecycle to com-prehensively summarize the possible risk patterns in the training data lifecycle.Then,it analyzes the causes of related risks from the perspectives of the intrinsic characteristics of the training data,ecological factors,and operational factors of generative AI developers.[Results/conclusion]It can be found that the fragmented nature and biases of the data are the starting points for risk occurrence;the ecological imbalance of the data is an external cause leading to risk;mean-while,the"black box"training data,biased data labeling,and lax data desensitization are internal causes of risk occur-rence.Therefore,targeting the characteristics of training data,a comprehensive risk governance scheme that encompass-es legal,market,community norms,and frameworks can be constructed using the"compassionate dots"framework.

外文关键词：

training datagenerative artificial intelligencedata governanceChatGPT

作者：

陈锐、江奕辉

展开 >

作者单位：

重庆大学法学院重庆 400044

关键词：

训练数据生成式人工智能数据治理 ChatGPT

基金：

司法部专项任务课题中央高校基本科研业务费项目

项目编号：

21SFB40042021CDSKXYFX009

出版年：

2024

DOI：

10.12154/j.qbzlgz.2024.04.009

情报资料工作

中国人民大学

情报资料工作

CSTPCDCSSCICHSSCD北大核心

影响因子：2.201

ISSN：1002-0314

年,卷(期)：2024.45(4)

被引量1
参考文献量18