首页|基于智能规划的多智能体强化学习算法

基于智能规划的多智能体强化学习算法

扫码查看
目前深度强化学习算法在不同应用领域中已经取得诸多成果,然而在多智能体任务领域中,往往面临大规模的具有稀疏奖励的非稳态环境,低探索效率问题仍是一大挑战.由于智能规划能够根据任务的初始状态和目标状态快速制定出决策方案,该方案能够作为各智能体的初始策略,并为其探索过程提供有效指导,因此尝试将智能规划与多智能体强化学习进行结合求解,并且提出统一模型 UniMP(a Unified model for Multi-agent Reinforcement Learning and AI Planning).在此基础上,设计并建立相应的问题求解机制.首先,将多智能体强化学习任务转化为智能决策任务;其次,对其执行启发式搜索,以得到一组宏观目标,进而指导强化学习的训练,使得各智能体能够进行更加高效的探索.在多智能体即时战略对抗场景StarCraft Ⅱ的各地图以及RMAICS战车模拟对战环境下进行实验,结果表明累计奖励值和胜率均有显著提升,从而验证了统一模型的可行性、求解机制的有效性以及所提算法灵活应对强化学习环境突发情况的能力.
Multi-agent Reinforcement Learning Algorithm Based on Al Planning
At present,deep reinforcement learning algorithms have made a lot of achievements in various fields.However,in the field of multi-agent task,agents are often faced with non-stationary environment with larger state-action space and sparse re-wards,low exploration efficiency is still a big challenge.Since AI planning can quickly obtain a solution according to the initial state and target state of the task,this solution can serve as the initial strategy of each agent and provide effective guidance for its exploration process,it is attempted to combine them and propose a unified model for multi-agent reinforcement learning and AI planning(UniMP).On the basis of it,the solution mechanism of the problem can be designed and implemented.By transforming the multi-agent reinforcement learning task into an intelligent decision task,and performing heuristic search on it,a set of macro-scopic goals will be obtained,which can guide the training process of reinforcement learning,so that agents can conduct more effi-cient exploration.Finally,experiments are carried out under the various maps of multi-agent real-time strategy game StarCraft Ⅱand RoboMaster AI Challenge Simulator 2D.The results show that the cumulative reward value and win rate are significantly im-proved,which verifies the feasibility of UniMP,the effectiveness of solution mechanism and the ability of our algorithm to flexibly deal with the sudden situation of reinforcement learning environment.

Multi-agent reinforcement learningAI planningHeuristically searchExploration efficiency

辛沅霞、华道阳、张犁

展开 >

浙江大学软件学院 浙江宁波 315103

浙江大学物理学院 杭州 310027

浙江大学计算机科学与技术学院 杭州 310027

多智能体强化学习 智能规划 启发式搜索 探索效率

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(5)
  • 41