首页|Dependable policy improvement for intelligent agents in new environments
Dependable policy improvement for intelligent agents in new environments
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
Elsevier
Intelligent agents often encounter challenges in balancing safety and performance when transitioning from general training scenarios to specific task scenarios due to unknown environmental differences. Under the uncertainty of new scenarios, safety considerations constrain extensive exploration, resulting in limited policy improvement. This paper proposes a novel reinforcement learning approach featuring a dependable policy improvement algorithm that emphasizes safety and confidence throughout the entire training process. The proposed algorithm enhances the baseline policy developed in general training scenarios to guide exploration and designs confidence bounds to evaluate both task performance and safety. By cautiously exploring and updating policies based on data confidence bounds, the approach ensures reliable agent behavior in new, uncertain, and potentially risky environments. Simulation experiments with an automatic guided vehicle (AGV) demonstrate the effectiveness of this approach across various scenarios.