生成式AI训练数据的法律风险及其元规制

扫码查看

原文链接

万方数据
维普

中文摘要：以ChatGPT为代表的生成式AI依托于海量的训练数据来实现模型的迭代升级,训练数据的质量和数量直接决定着生成式AI的性能和泛化能力.然而,训练数据本身潜藏着来源合法性、质量可信性、规模偏离性等风险,自我规制与政府规制路径都难以契合生成式AI的市场布局与更迭速度,亟须在包容审慎理念下对生成式AI训练数据予以元规制.在元规制理念下,国家通过规范引导模型研发者将经设计的数据保护与科技伦理理念内嵌于生成式AI的训练数据中,促成数据保护从利用环节延伸至研发环节,通过可信的数据来源、数据分类分级、数据影响评估等措施促成模型研发者自我观照式的内省,并经由数据保护的监管沙盒实现自我规制的规制.

外文标题：Legal Risks and Meta-regulation of Generative AI Training Data

外文摘要：Generative AI represented by ChatGPT relies on massive training data to realize itera-tive upgrade of the model,and the quality and quantity of training data directly determine the perfor-mance and generalization ability of generative AI.However,the training data itself has hidden risks such as source legitimacy,quality credibility,scale deviation and so on.Both self-regulation and gov-ernment regulation paths are difficult to match the market layout and changing speed of generative AI,so it is urgent to meta-regulate generative AI training data under the concept of inclusive and prudence.Under the concept of meta-regulation,the state guides model developers to embed the data protection by design and technological ethics into the training data of generative AI through regulations,so as to extend data protection from the utilization stage to the research and development stage.Through credible data sources,data classification,data impact assessment and other measures,it promotes model devel-opers to conduct self-reflection and achieve meta-regulation through data protection regulatory sandbox.

外文关键词：

generative AIChatGPTtraining datameta-regulation

作者：

王海洋

展开 >

作者单位：

西南政法大学博士后流动站(重庆401120)

关键词：

生成式AI ChatGPT 训练数据元规制

基金：

2023年度国家资助博士后研究人员计划

项目编号：

GZC20232201

出版年：

2024

浙江社会科学

浙江省社会科学界联合会

浙江社会科学

CSTPCDCSSCICHSSCD北大核心

影响因子：0.677

ISSN：1004-2253

年,卷(期)：2024.(9)

参考文献量28