首页|部分可观测条件下的策略迁移强化学习方法

部分可观测条件下的策略迁移强化学习方法

扫码查看
针对多智能体强化学习算法在部分可观测条件下难以形成有效协同策略的问题,基于集中式训练与分散式执行范式(CTDE)提出一种策略迁移强化学习方法.该方法在全局观测下训练可以探索到良好协同策略的教师模块,在部分可观测条件下,学生模块依据最大化累计回报的期望为 目标函数在线训练的同时,利用策略蒸馏技术从教师模块进行策略迁移,并自适应调整教师策略对学生策略的影响比重.在多个地图场景中对所提出的方法进行仿真验证,实验结果表明部分可观测条件下学生模块的胜率高于所对比的基线算法的胜率.研究成果可以应用于多智能体合作任务,提升智能体在分散式执行时的协同性能.
Policy Transfer Reinforcement Learning Method for Partially Observable Conditions
Multi-agent reinforcement learning algorithms fail to form effective collaborative policy under partially observable conditions.In view of this problem,a policy transfer reinforcement learning method based on centralized training and decentralized execution(CTDE)paradigm was proposed.Firstly,under global observation,the teacher module was trained to explore good collaborative policy.Then,under partially observable conditions,the student module was trained online with the expectation of maximizing cumulative returns as the objective function,and at the same time,policy distillation techniques were used to transfer policy from the teacher module and adaptively adjust the proportion of teacher policy affecting student policy.Finally,the proposed method was verified by simulation in multiple map scenarios.The experimental results show that under partially observable conditions,the success rate of student modules is higher than that of the baseline algorithms.The research results can be applied to multi-agent collaborative tasks,improving the collaborative performance of agents in decentralized execution.

multi-agentreinforcement learningpartial observationpolicy transfercentralized training and decentralized execution(CTDE)

王忠禹、徐晓鹏、王东

展开 >

大连理工大学控制科学与工程学院,辽宁大连 116024

多智能体 强化学习 部分观测 策略迁移 集中式训练与分散式执行

国家自然科学基金国家自然科学基金

6197305062173061

2024

现代防御技术
北京电子工程总体研究所

现代防御技术

CSTPCD北大核心
影响因子:0.357
ISSN:1009-086X
年,卷(期):2024.52(2)
  • 21