Research on the Governance of Training Data as the Core Driving Force of Generative Artificial Intelligence
[Purpose/significance]In current research,there is less focus on the governance issues of training data for generative artificial intelligence.However,there are numerous risks in the lifecycle of training data that cannot be ig-nored and urgently need effective governance.[Methods/process]This paper,based on the demonstration that training data is the core driving force of generative artificial intelligence,uses the theoretical model of the data lifecycle to com-prehensively summarize the possible risk patterns in the training data lifecycle.Then,it analyzes the causes of related risks from the perspectives of the intrinsic characteristics of the training data,ecological factors,and operational factors of generative AI developers.[Results/conclusion]It can be found that the fragmented nature and biases of the data are the starting points for risk occurrence;the ecological imbalance of the data is an external cause leading to risk;mean-while,the"black box"training data,biased data labeling,and lax data desensitization are internal causes of risk occur-rence.Therefore,targeting the characteristics of training data,a comprehensive risk governance scheme that encompass-es legal,market,community norms,and frameworks can be constructed using the"compassionate dots"framework.
training datagenerative artificial intelligencedata governanceChatGPT