基于模型梯度的通用控制策略泛化性提升预训练方法

扫码查看

原文链接

万方数据
维普

中文摘要：使用强化学习为机器人训练控制策略的做法已得到广泛应用.其中,作为一种适用于不同形态机器人的单一策略模型方法,通用控制策略克服了传统方法中需为不同形态机器人独立训练控制策略的限制,能够有效节约计算成本并具备对未知形态的一定泛化能力,但这种方法存在过拟合严重和泛化性不足等问题.为提升通用控制策略的泛化性,本文提出一种基于模型梯度的通用控制策略预训练方法.该方法基于训练样本对应的模型参数梯度更新越大则代表该样本包含更多相对于模型未知的信息的原理,以模型在训练时的梯度为指标,构建可以代表形态空间的通用控制策略预训练形态集,有效提升了通用控制策略的泛化能力.通过在整个形态空间中随机生成的形态上的迁移实验,该方法展示了其在提高通用控制策略泛化能力方面的显著优势.

外文标题：Pre-training Method for Enhancing the Generalization of Universal Control Policy Based on Model Gradients

外文摘要：The integration of reinforcement learning with robotic control tasks has led to significant advancements.Among these approaches,the universal control policy serves as a single policy model applicable to various robot morphologies,overcoming the limitations of traditional methods that require independent training of control policies for different robot types.This approach can effectively reduce computational costs while providing a certain degree of generalization capability for unknown morphologies.However,existing methods suffer from severe overfitting issues and insufficient generalization.To enhance the gener-alizability of universal control policy,a pre-training method based on model gradient selection for universal control policy is pro-posed.This method leverages the principle that the larger the model gradient update corresponding to a training sample is,the more information unknown to the model is contained in the sample.By using the loss gradients of the model during training as an indicator,a pre-training morphology set representing the morphology space is constructed,effectively improving the generalization ability of the universal control policy.Through transfer experiments on randomly generated morphologies across the entire morphology space,this method demonstrated significant advantages in improving the generalization ability of universal control policies.

外文关键词：

robotuniversal control policygeneralizationmodel gradient

作者：

郝逸凡、杨扬、彭伟、王菲菲、姚雯

展开 >

作者单位：

军事科学院国防科技创新研究院,北京 100071

中国人民大学统计学院,北京 100872

关键词：

机器人通用控制策略泛化性模型梯度

出版年：

2024