基于双重限制Q学习的机器人控制方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：离线强化学习凭借不需要智能体与环境交互即可训练出令人满意效果的优势,在近期得到了非常迅速的发展.为了缓解外推误差和离线强化学习算法过于保守的问题,文中提出了基于双重限制Q学习的离线强化学习算法DIQL,限制Q值网络对数据分布外(out-of-distribution,OOD)动作估计值不应与经数据增强后的状态V估计值差距过大,限制策略产生的OOD动作距离数据集分布的均方差不应过大,在双重限制的前提下鼓励算法探索,当数据集质量较差的情况下仍能取得较好的效果.为了验证算法的有效性,特在双足六自由度机器人步态控制环境中进行实验,结果表明DIQL算法可以有效的处理OOD动作,缓解了外推误差和算法过于保守的问题.

外文标题：Robot Control Method Based on Double Limit Q Learning

外文摘要：Offline reinforcement learning,which has the advantage of training satisfactory results without the interaction between the agent and the environment,has been developing rapidly recently.To alleviate the problem of extrinsic error and too conservative offline reinforcement learning algorithm,this paper proposes an offline reinforcement learn-ing algorithm DIQL based on double restricted Q learning.The estimated value of the out-of-distribution(OOD)action of the restricted Q value network should not be too far from the estimated value of state V after data enhancement,and the limiting strategy should be adopted the mean square error of the generated OOD motion distance data set distribution should not be too large,and the algorithm exploration should be encouraged under the premise of double restrictions so that better results can be achieved even when the quality of the data set is poor.To verify the effec-tiveness of the algorithm,experiments are carried out in the gait control environment of a bipedal 6-DOF robot.The results show that the DIQL algorithm can effectively handle OOD actions and alleviate the problems of extrapolation error and over-conservative algorithm.

外文关键词：

offline reinforcement learningout-of-distribution(OOD)Q-learningextrapolation errorbiped robot

作者：

周维庆、王飞、赵德京

展开 >

作者单位：

青岛大学自动化学院,青岛 266071

山东省工业控制技术重点实验室,青岛 266071

山东潍坊烟草有限公司,潍坊 262400

关键词：

离线强化学习 OOD Q学习外推误差双足机器人

基金：

国家自然科学基金项目

项目编号：

61903209

出版年：

2024

DOI：

10.19557/j.cnki.1001-9944.2024.03.013

自动化与仪表

天津市工业自动化仪表研究所天津市自动化学会

自动化与仪表

CSTPCD

影响因子：0.548

ISSN：1001-9944

年,卷(期)：2024.39(3)

参考文献量15