首页|Dependable policy improvement for intelligent agents in new environments

Dependable policy improvement for intelligent agents in new environments

扫码查看
Intelligent agents often encounter challenges in balancing safety and performance when transitioning from general training scenarios to specific task scenarios due to unknown environmental differences. Under the uncertainty of new scenarios, safety considerations constrain extensive exploration, resulting in limited policy improvement. This paper proposes a novel reinforcement learning approach featuring a dependable policy improvement algorithm that emphasizes safety and confidence throughout the entire training process. The proposed algorithm enhances the baseline policy developed in general training scenarios to guide exploration and designs confidence bounds to evaluate both task performance and safety. By cautiously exploring and updating policies based on data confidence bounds, the approach ensures reliable agent behavior in new, uncertain, and potentially risky environments. Simulation experiments with an automatic guided vehicle (AGV) demonstrate the effectiveness of this approach across various scenarios.

Reinforcement learningUncertain environmentSafe explorationConfidence boundDependable policy improvement

Li, Yao、Liang, Zhenglin

展开 >

Tsinghua University Department of Industrial Engineering

2025

Reliability engineering & system safety

Reliability engineering & system safety

SCI
ISSN:0951-8320
年,卷(期):2025.261(Sep.)
  • 36