Pre-training Method for Enhancing the Generalization of Universal Control Policy Based on Model Gradients
The integration of reinforcement learning with robotic control tasks has led to significant advancements.Among these approaches,the universal control policy serves as a single policy model applicable to various robot morphologies,overcoming the limitations of traditional methods that require independent training of control policies for different robot types.This approach can effectively reduce computational costs while providing a certain degree of generalization capability for unknown morphologies.However,existing methods suffer from severe overfitting issues and insufficient generalization.To enhance the gener-alizability of universal control policy,a pre-training method based on model gradient selection for universal control policy is pro-posed.This method leverages the principle that the larger the model gradient update corresponding to a training sample is,the more information unknown to the model is contained in the sample.By using the loss gradients of the model during training as an indicator,a pre-training morphology set representing the morphology space is constructed,effectively improving the generalization ability of the universal control policy.Through transfer experiments on randomly generated morphologies across the entire morphology space,this method demonstrated significant advantages in improving the generalization ability of universal control policies.
robotuniversal control policygeneralizationmodel gradient