Legal Risks and Meta-regulation of Generative AI Training Data
Generative AI represented by ChatGPT relies on massive training data to realize itera-tive upgrade of the model,and the quality and quantity of training data directly determine the perfor-mance and generalization ability of generative AI.However,the training data itself has hidden risks such as source legitimacy,quality credibility,scale deviation and so on.Both self-regulation and gov-ernment regulation paths are difficult to match the market layout and changing speed of generative AI,so it is urgent to meta-regulate generative AI training data under the concept of inclusive and prudence.Under the concept of meta-regulation,the state guides model developers to embed the data protection by design and technological ethics into the training data of generative AI through regulations,so as to extend data protection from the utilization stage to the research and development stage.Through credible data sources,data classification,data impact assessment and other measures,it promotes model devel-opers to conduct self-reflection and achieve meta-regulation through data protection regulatory sandbox.