Policy Transfer Reinforcement Learning Method for Partially Observable Conditions
Multi-agent reinforcement learning algorithms fail to form effective collaborative policy under partially observable conditions.In view of this problem,a policy transfer reinforcement learning method based on centralized training and decentralized execution(CTDE)paradigm was proposed.Firstly,under global observation,the teacher module was trained to explore good collaborative policy.Then,under partially observable conditions,the student module was trained online with the expectation of maximizing cumulative returns as the objective function,and at the same time,policy distillation techniques were used to transfer policy from the teacher module and adaptively adjust the proportion of teacher policy affecting student policy.Finally,the proposed method was verified by simulation in multiple map scenarios.The experimental results show that under partially observable conditions,the success rate of student modules is higher than that of the baseline algorithms.The research results can be applied to multi-agent collaborative tasks,improving the collaborative performance of agents in decentralized execution.
multi-agentreinforcement learningpartial observationpolicy transfercentralized training and decentralized execution(CTDE)