首页|结合元学习和安全区域探索的进化强化学习方法

结合元学习和安全区域探索的进化强化学习方法

扫码查看
最近提出的进化强化学习(evolutionary reinforcement learning,ERL)框架表明了利用进化算法提高强化学习的探索能力对性能提升的好处。然而,现有的基于ERL的方法并没有完全解决进化算法中突变的可伸缩性问题且由于进化算法本身的限制使得ERL解决问题的速度较为缓慢。为了使算法每一步的探索都被限制在安全区域中且能在较短的时间内收敛,运用元学习的思想,预训练一个初始的种群,这个种群只需要经过几次进化就能得到任务中不错的效果。将预训练过后的种群用于处理任务,在此过程中,利用敏感度调整种群突变的范围,限制种群在安全区域内进行突变,确保种群的突变不会带来无法预料的后果。该方法在来自OpenAI gym中的五种机器人运动中进行了评估。最终在所有测试的环境中,该方法在以ERL、CEM-RL以及两种最先进的RL算法、PPO和TD3为基线的比较中,取得了具有竞争性的效果。
Evolutionary Reinforcement Learning Combining Meta-Learning and Safe Region Exploration
The recently proposed framework of evolutionary reinforcement learning(ERL)has demonstrated the benefits of improving the exploration ability of evolutionary algorithm in reinforcement learning for performance improvement.However,the existing ERL-based methods do not fully solve the scalability problem of mutation in evolutionary algorithms,and the speed of ERL to solve the problem is slow due to the limitations of evolutionary algorithms.In order to make the exploration of each step of the algorithm be restricted in the safe area and converge in a short time,the idea of meta-learning is first used to pre-train an initial population,which only needs to undergo several times of evolution to get a good effect in the task.Secondly,the pre-trained population is used for processing tasks.In this process,sensitivity is used to adjust the range of population mutation,limit the population mutation in the safe area,and ensure that the population mutation will not bring unexpected consequences.The method is evaluated in five robot exercises from the OpenAI gym.Finally,in all the test environments,the method achieves competitive results in the baseline comparison of ERL,CEM-RL,and the two most advanced RL algorithms,PPO and TD3.

evolutionary reinforcement learningmeta-learningpre-trainingsafe regionmutation operator

李晓益、胡滨、秦进、彭安浪

展开 >

贵州大学计算机科学与技术学院公共大数据国家重点实验室,贵阳 550025

贵州大学计算机科学与技术学院,贵阳 550025

贵州兆信数码技术有限公司,贵阳 550025

进化强化学习 元学习 预训练 安全区域 突变算子

2025

计算机工程与应用
华北计算技术研究所

计算机工程与应用

北大核心
影响因子:0.683
ISSN:1002-8331
年,卷(期):2025.61(1)