Artificial Intelligence Alignment: What Can Economics Do?
AI alignment refers to aligning the development of AI technology with human interests, ensuring that AI understands human norms and values, grasps human wishes and intentions, and acts according to human will. According to the different alignment targets, AI alignment can be divided into different levels. In practice, there are two main methods for AI alignment: reinforcement learning from human feedback (RLHF) and constitutional AI (CAI). RLHF relies on human feedback to adjust AI behaviors, while CAI guides AI through predetermined rules. Although these methods have achieved success in practice, they also face issues such as diverse values and controversial rule setting. In some literature, AI alignment is equated with AI value alignment, but this viewpoint is incorrect. From an economic perspective, the goal of making AI agent behavior in line with human interests can be achieved by changing its utility function or by changing its constraint conditions. This paper distinguishes between these two paths, referring to the former as AI value alignment and the latter as AI incentive compatibility alignment.This paper points out that economic theories can provide many useful references for both AI value alignment and AI incentive compatibility alignment. How to select value goals for alignment in an AI value diversification environment has always been a challenge. Many conclusions in social choice theory can provide useful inspiration for this issue. When conducting incentive compatibility alignment, people can use methods such as changing game rules, incentive structures, or information structures to influence AI agents and make their behavior align with human interests. In this process, mechanism design, contract theory, and information design theory can provide many useful references.Mechanism design theory mainly studies how to change the outcome of games through game rules. In the field of AI alignment, it can help people better design interaction rules between multi-agent systems, so that the overall behavioral consequences of multi-agent systems are in line with human interests without changing the utility function of AI agents. In practice, it has been applied in the traffic planning of autonomous vehicles. Contract theory focuses on studying the promotion of cooperation between different entities through contractual arrangements. It inspires people to change AI behavior by arranging the power and responsibility relationship and incentive plans between humans and AI. In the field of AI alignment, the design of reward schemes for multi-task AI agents and the Off-Switch Game have drawn relevant knowledge from contract theory. Information design theory focuses on changing equilibrium outcomes by altering information outcomes. In the field of AI, collaboration between humans and AI requires continuous information exchange, which enables humans to influence AI behavior by changing information structures. Many theories in the field of information design, such as Bayesian persuasion, can help people better carry out similar work.In summary, AI agents share similarities with humans in many aspects. As a science that studies human behavior, economics can provide many insights for AI alignment. The author hopes that this paper will encourage more economists to join this booming field and work together to promote the healthy development of AI.
AI alignmentAI value alignmentAI incentive compatibility alignment