在多智能体强化学习中,信息设计是智能体通过发送信号的方式影响其他智能体的行动,以最大化自身回报的一种博弈方法,广泛应用于交易市场、经济领域.现有信息设计方法有一些特定的限制条件,即环境中仅允许接收方可以采取行动,发送方只发送信号而没有要执行的行动.这一假设使得该理论模型很难适用于更多现实场景.首先,建立一种更符合现实场景的马尔可夫信号博弈模型,允许发送方不仅能够发送信号而且可以行动.然后,提出一种带有服从约束的信号-行动梯度(Signaling-Action Gradient with Obedience Constraint,SAGOC)算法,用于计算接收方愿意服从的发送方最优信号和行动策略.该算法不仅解决了信号对双方行动策略更新影响的非稳定性问题,还解决了接收方对发送信号的服从性问题.实验结果表明,相对于基准算法,SAGOC算法在同时允许发送方发送信号和执行行动的场景中更为有效.
Joint Design of Information and Action in Markov Signaling Game
In multi-agent reinforcement learning,information design is a method in which a self-interested agent influences the action policies of other agents by sending signals to maximize its returns,and it is widely used in the trading market and eco-nomic fields.The existing information design method has some specific constraints,i.e.,only the receiver can take actions in the environment,and the sender only sends signals without any actions to perform.This assumption makes it difficult for the theoretical model to be applicable to more real-world scenarios.This study first establishes a Markov Signaling Game model that is more a-ligned with real-world scenarios,allowing the sender not only to send signals but also to take actions.Then,the study proposes a Signaling-Action Gradient with Obedience Constraint ( SAGOC) algorithm to compute the sender's optimal information and action policy that the receiver is willing to respect.This algorithm addresses not only the non-stationarity of the signal's effect on both parties' action policy updates but also the receiver's obedience of the sent information.Experimental results show that,compared to the benchmark algorithm,the SAGOC algorithm is more effective in scenarios where the sender is allowed to send signals and take actions simultaneously.
information designMarkov Signaling Gamemulti-agent reinforcement learning