首页|New Robotics Study Findings Recently Were Reported by Researchers at Tsinghua Un iversity (Model-based Chance-constrained Reinforcement Learning Via Separated Pr oportional-integral Lagrangian)
New Robotics Study Findings Recently Were Reported by Researchers at Tsinghua Un iversity (Model-based Chance-constrained Reinforcement Learning Via Separated Pr oportional-integral Lagrangian)
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Data detailed on Robotics have been pr esented.According to news originating from Beijing,People's Republic of China,by NewsRx correspondents,research stated,"Safety is essential for reinforceme nt learning (RL) applied in the real world.Adding chance constraints (or probab ilistic constraints) is a suitable way to enhance RL safety under uncertainty." Funders for this research include International Science and Technology Cooperati on Program of China,National Natural Science Foundation of China (NSFC).Our news journalists obtained a quote from the research from Tsinghua University,"Existing chanceconstrained RL methods,such as the penalty methods and the L agrangian methods,either exhibit periodic oscillations or learn an overconserva tive or unsafe policy.In this article,we address these shortcomings by proposi ng a separated proportional-integral Lagrangian (SPIL) algorithm.We first revie w the constrained policy optimization process from a feedback control perspectiv e,which regards the penalty weight as the control input and the safe probabilit y as the control output.Based on this,the penalty method is formulated as a pr oportional controller,and the Lagrangian method is formulated as an integral co ntroller.We then unify them and present a proportional-integral Lagrangian meth od to get both their merits with an integral separation technique to limit the i ntegral value to a reasonable range.To accelerate training,the gradient of saf e probability is computed in a model-based manner.The convergence of the overal l algorithm is analyzed.We demonstrate that our method can reduce the oscillati ons and conservatism of RL policy in a car-following simulation."
BeijingPeople's Republic of ChinaAsi aEmerging TechnologiesMachine LearningReinforcement LearningRobotRobot icsTsinghua University