Abstract
Skill learning through reinforcement learning has significantly progressed in recent years.How-ever,it often struggles to efficiently find optimal or near-optimal policies due to the inherent trial-and-error exploration in reinforcement learning.Although algorithms have been proposed to enhance skill learning efficacy,there is still much room for improvement in terms of skill learning performance and training sta-bility.In this paper,we propose an algorithm called skill enhancement learning with knowledge distillation(SELKD),which integrates multiple actors and multiple critics for skill learning.SELKD employs knowledge distillation to establish a mutual learning mechanism among actors.To mitigate critic overestimation bias,we introduce a novel target value calculation method.We also perform theoretical analysis to ensure the convergence of SELKD.Finally,experiments are conducted on several continuous control tasks,illustrating the effectiveness of the proposed algorithm.
基金项目
"New Generation Artificial Intelligence"Key Field Research and Development Plan of Guangdong Province(2021B0101410002)
National Science and Technology Major Project of the Ministry of Science and Technology of China(2018AAA0102900)
National Natural Science Foundation of China(U22A2057)
National Natural Science Foundation of China(62133013)