Dependable policy improvement for intelligent agents in new environments

扫码查看

原文链接

NETL
NSTL
Elsevier

外文摘要：Intelligent agents often encounter challenges in balancing safety and performance when transitioning from general training scenarios to specific task scenarios due to unknown environmental differences. Under the uncertainty of new scenarios, safety considerations constrain extensive exploration, resulting in limited policy improvement. This paper proposes a novel reinforcement learning approach featuring a dependable policy improvement algorithm that emphasizes safety and confidence throughout the entire training process. The proposed algorithm enhances the baseline policy developed in general training scenarios to guide exploration and designs confidence bounds to evaluate both task performance and safety. By cautiously exploring and updating policies based on data confidence bounds, the approach ensures reliable agent behavior in new, uncertain, and potentially risky environments. Simulation experiments with an automatic guided vehicle (AGV) demonstrate the effectiveness of this approach across various scenarios.

外文关键词：

Reinforcement learningUncertain environmentSafe explorationConfidence boundDependable policy improvement

作者：

Li, Yao、Liang, Zhenglin

展开 >

作者单位：

Tsinghua University Department of Industrial Engineering

出版年：

2025

DOI：

10.1016/j.ress.2025.111116

Reliability engineering & system safety

SCI

ISSN：0951-8320

年,卷(期)：2025.261(Sep.)

参考文献量36