首页|人工智能对齐:经济学可以做什么?

人工智能对齐:经济学可以做什么?

扫码查看
AI技术的迅猛发展在对社会发展产生巨大推动作用的同时,也带来了很多问题和风险.在这种情况下,AI对齐,即确保AI的行为结果始终符合人类的利益变得越来越重要.从经济学的角度看,AI对齐可以区分为AI价值对齐和AI激励相容对齐,这两种对齐方式分别通过改变AI智能体的效用函数和AI智能体面临的约束条件来影响其行为.AI智能体在很多方面都和人类存在着共性.作为研究人类行为的科学,经济学可以为AI对齐提供很多启示.本文着重讨论了社会选择理论、机制设计理论、契约理论和信息设计理论在AI对齐领域的应用,并对这些理论在实践中的用例进行了介绍.由这些用例可见,经济学在AI对齐领域大有可为,其作用还远未得到发挥.
Artificial Intelligence Alignment: What Can Economics Do?
AI alignment refers to aligning the development of AI technology with human interests, ensuring that AI understands human norms and values, grasps human wishes and intentions, and acts according to human will. According to the different alignment targets, AI alignment can be divided into different levels. In practice, there are two main methods for AI alignment: reinforcement learning from human feedback (RLHF) and constitutional AI (CAI). RLHF relies on human feedback to adjust AI behaviors, while CAI guides AI through predetermined rules. Although these methods have achieved success in practice, they also face issues such as diverse values and controversial rule setting. In some literature, AI alignment is equated with AI value alignment, but this viewpoint is incorrect. From an economic perspective, the goal of making AI agent behavior in line with human interests can be achieved by changing its utility function or by changing its constraint conditions. This paper distinguishes between these two paths, referring to the former as AI value alignment and the latter as AI incentive compatibility alignment.This paper points out that economic theories can provide many useful references for both AI value alignment and AI incentive compatibility alignment. How to select value goals for alignment in an AI value diversification environment has always been a challenge. Many conclusions in social choice theory can provide useful inspiration for this issue. When conducting incentive compatibility alignment, people can use methods such as changing game rules, incentive structures, or information structures to influence AI agents and make their behavior align with human interests. In this process, mechanism design, contract theory, and information design theory can provide many useful references.Mechanism design theory mainly studies how to change the outcome of games through game rules. In the field of AI alignment, it can help people better design interaction rules between multi-agent systems, so that the overall behavioral consequences of multi-agent systems are in line with human interests without changing the utility function of AI agents. In practice, it has been applied in the traffic planning of autonomous vehicles. Contract theory focuses on studying the promotion of cooperation between different entities through contractual arrangements. It inspires people to change AI behavior by arranging the power and responsibility relationship and incentive plans between humans and AI. In the field of AI alignment, the design of reward schemes for multi-task AI agents and the Off-Switch Game have drawn relevant knowledge from contract theory. Information design theory focuses on changing equilibrium outcomes by altering information outcomes. In the field of AI, collaboration between humans and AI requires continuous information exchange, which enables humans to influence AI behavior by changing information structures. Many theories in the field of information design, such as Bayesian persuasion, can help people better carry out similar work.In summary, AI agents share similarities with humans in many aspects. As a science that studies human behavior, economics can provide many insights for AI alignment. The author hopes that this paper will encourage more economists to join this booming field and work together to promote the healthy development of AI.

AI alignmentAI value alignmentAI incentive compatibility alignment

陈永伟

展开 >

《比较》杂志社 研究部,北京 100029

AI对齐 AI价值对齐 AI激励相容对齐

2024

东北财经大学学报
东北财经大学

东北财经大学学报

影响因子:0.969
ISSN:1008-4096
年,卷(期):2024.(4)
  • 3