Evolutionary Reinforcement Learning Combining Meta-Learning and Safe Region Exploration
The recently proposed framework of evolutionary reinforcement learning(ERL)has demonstrated the benefits of improving the exploration ability of evolutionary algorithm in reinforcement learning for performance improvement.However,the existing ERL-based methods do not fully solve the scalability problem of mutation in evolutionary algorithms,and the speed of ERL to solve the problem is slow due to the limitations of evolutionary algorithms.In order to make the exploration of each step of the algorithm be restricted in the safe area and converge in a short time,the idea of meta-learning is first used to pre-train an initial population,which only needs to undergo several times of evolution to get a good effect in the task.Secondly,the pre-trained population is used for processing tasks.In this process,sensitivity is used to adjust the range of population mutation,limit the population mutation in the safe area,and ensure that the population mutation will not bring unexpected consequences.The method is evaluated in five robot exercises from the OpenAI gym.Finally,in all the test environments,the method achieves competitive results in the baseline comparison of ERL,CEM-RL,and the two most advanced RL algorithms,PPO and TD3.