部分可观测条件下的策略迁移强化学习方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对多智能体强化学习算法在部分可观测条件下难以形成有效协同策略的问题,基于集中式训练与分散式执行范式(CTDE)提出一种策略迁移强化学习方法.该方法在全局观测下训练可以探索到良好协同策略的教师模块,在部分可观测条件下,学生模块依据最大化累计回报的期望为目标函数在线训练的同时,利用策略蒸馏技术从教师模块进行策略迁移,并自适应调整教师策略对学生策略的影响比重.在多个地图场景中对所提出的方法进行仿真验证,实验结果表明部分可观测条件下学生模块的胜率高于所对比的基线算法的胜率.研究成果可以应用于多智能体合作任务,提升智能体在分散式执行时的协同性能.

外文标题：Policy Transfer Reinforcement Learning Method for Partially Observable Conditions

外文摘要：Multi-agent reinforcement learning algorithms fail to form effective collaborative policy under partially observable conditions.In view of this problem,a policy transfer reinforcement learning method based on centralized training and decentralized execution(CTDE)paradigm was proposed.Firstly,under global observation,the teacher module was trained to explore good collaborative policy.Then,under partially observable conditions,the student module was trained online with the expectation of maximizing cumulative returns as the objective function,and at the same time,policy distillation techniques were used to transfer policy from the teacher module and adaptively adjust the proportion of teacher policy affecting student policy.Finally,the proposed method was verified by simulation in multiple map scenarios.The experimental results show that under partially observable conditions,the success rate of student modules is higher than that of the baseline algorithms.The research results can be applied to multi-agent collaborative tasks,improving the collaborative performance of agents in decentralized execution.

外文关键词：

multi-agentreinforcement learningpartial observationpolicy transfercentralized training and decentralized execution(CTDE)

作者：

王忠禹、徐晓鹏、王东

展开 >

作者单位：

大连理工大学控制科学与工程学院,辽宁大连 116024

关键词：

多智能体强化学习部分观测策略迁移集中式训练与分散式执行

基金：

国家自然科学基金国家自然科学基金

项目编号：

6197305062173061

出版年：

2024

DOI：

10.3969/j.issn.1009-086x.2024.02.007

现代防御技术

北京电子工程总体研究所

现代防御技术

CSTPCD北大核心

影响因子：0.357

ISSN：1009-086X

年,卷(期)：2024.52(2)

参考文献量21