首页|稀疏异质多智能体环境下基于强化学习的课程学习框架

稀疏异质多智能体环境下基于强化学习的课程学习框架

扫码查看
现代战争的战场较大且兵种较多,利用多智能体强化学习(MARL)进行战场推演可以加强作战单位之间的协同决策能力,从而提升战斗力.当前MARL在兵棋推演研究和对抗演练中的应用普遍存在两个简化:各个智能体的同质化以及作战单位分布稠密.实际战争场景中并不总是满足这两个设定,可能包含多种异质的智能体以及作战单位分布稀疏.为了探索强化学习在更多场景中的应用,分别就这两方面进行改进研究.首先,设计并实现了多尺度多智能体抢滩登陆环境M2 ALE,M2 ALE针对上述两个简化设定做了针对性的复杂化,添加了多种异质智能体和作战单位分布稀疏的场景,这两种复杂化设定加剧了多智能体环境的探索困难问题和非平稳性,使用常用的多智能体算法通常难以训练.其次,提出了一种异质多智能体课程学习框架HMACL,用于应对M2ALE环境的难点.HMACL包括3个模块:1)任务生成模块(STG),用于生成源任务以引导智能体训练;2)种类策略提升模块(CPI),针对多智能体系统本身的非平稳性,提出了一种基于智能体种类的参数共享(Class Based Parameter Sharing)策略,实现了异质智能体系统中的参数共享;3)训练模块(Trainer),通过从STG获取源任务,从CPI获取最新的策略,使用任意MARL算法训练当前的最新策略.HMACL可以缓解常用MARL算法在M2ALE环境中的探索难问题和非平稳性问题,引导多智能体系统在M2ALE环境中的学习过程.实验结果表明,使用HMACL使得MARL算法在M2ALE环境下的采样效率和最终性能得到大幅度的提升.
Curriculum Learning Framework Based on Reinforcement Learning in Sparse Heterogeneous Multi-agent Environments
The battlefield of modern warfare is large and has a variety of units,and the use of multi-agent reinforcement learning(MARL)in battlefield simulation can enhance the collaborative decision-making ability among combat units and thus improve combat effectiveness.Current applications of Multi-agent reinforcement learning(MARL)in military simulation often rely on two simplifications:the homogeneity of agents and dense distribution of combat units,real-world warfare scenarios may not always ad-here to these assumptions and may include various heterogeneous agents and sparsely distributed combat units.In order to ex-plore the potential applications of reinforcement learning in a wider range of scenarios,this paper proposes improvements in these two aspects.Firstly,a multi-scale multi-agent amphibious landing environment(M2ALE)is designed to address the simplifica-tions,incorporating various heterogeneous agents and scenarios with sparsely distributed combat units.These complex settings exacerbate the exploration difficulty and non-stationarity of multi-agent environments,making it difficult to train with commonly used multi-agent algorithms.Secondly,a heterogeneous multi-agent curriculum learning framework(HMACL)is proposed to ad-dress the challenges in the M2ALE environment.HMACL consists of three modules:source task generating(STG)module,class policy improving(CPI)module,and Trainer module.The STG module generates source tasks to guide agent training,while the CPI module proposes a class-based parameter sharing strategy to mitigate the non-stationarity of the multi-agent system and im-plement parameter sharing in a heterogeneous agent system.The Trainer module trains the latest policy using any MARL algo-rithm with the source tasks generated by the STG and the latest policy from the CPI.HMACL can alleviate the exploration diffi-culty and non-stationarity issues of commonly used MARL algorithms in the M2ALE environment and guide the learning process of the multi-agent system.Experiments show that using HMACL significantly improves the sampling efficiency and final per-formance of MARL algorithms in the M2ALE environment.

Multi-agent reinforcement learningCombat simulationCurriculum learningParameter sharingMulti-agent environ-ment design

罗睿卿、曾坤、张欣景

展开 >

中山大学计算机学院 广州 510006

中国人民解放军91976部队 广州 510430

多智能体强化学习 作战仿真 课程学习 参数共享 多智能体环境设计

国家自然科学基金广东省基础与应用基础研究基金联合基金

U17112662019A1515011078

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(1)
  • 3