Copyright Data Dilemma of Building High-Quality Data System for AI:Present Situation,Coping Strategies,and Implementation Path
[Purpose/Significance]Improving the policy and governance systems to promote the development of strategic industries such as artificial intelligence was explicitly proposed in the resolution of the Third Plenary Session of the 20th Central Committee of the Communist Party of China.In recent years,the conflict between AI companies'desire for copyrighted data and the copyright holders'protection of copyrighted data has become increasingly apparent.There have been a number of lawsuits and disputes around the world regarding copyright infringement caused by artificial intelligence.The dilemma of copyright protection of AI training data has become a difficulty and bottleneck that urgently needs to be resolved in the development of high-quality data system for AI.[Method/Process]Based on the academic research and industrial practice on the copyright protection of AI data,this study systematically summarizes six representative approaches to address the copyright dilemma of AI training data,and provides a comparative analysis of the advantages,disadvantages,and applicability of these approaches.The six representative approaches are:signing a license agreement by both parties,initiating special plans or forming alliances,introducing a copyright notice mechanism,introducing a copyright risk guarantee mechanism,replacing with synthetic data,and applying copyright detection tools to large language models.For the copyright dilemma of AI training data,there is no optimal solution that can both encourage the supply of AI copyright training data and protect the copyright of data.[Results/Conclusions]In order to provide helpful references for increasing the supply of AI copyright data,formulating relevant policies,and promoting related work,this study has proposed a concept of general implementation path to build a high-quality data system for AI to solve the copyright dilemma of AI training data,based on the comparative analysis of the above six representative approaches and combined with China's four unique advantages.These include:1)Integrating existing platforms to build a national-level integrated service platform for copyright data for AI,with state-owned enterprises(SOEs)under the direct administration of the central government taking the lead in establishing a national copyright data alliance and connecting copyright data to the platform.2)To collaborate with local pilots of data intellectual property rights,explore and promote comprehensive reform pilot programs of copyright data adapted to the development of AI,and continuously strengthen the cooperation efforts and willingness between AI enterprises and copyright holders.3)The focus should be on principled or critical issues,establishing and improving legislation related to copyright data for AI and promoting industry self-regulation.
artificial intelligencedata system for AIcopyright protectioncopyright datadata elements