基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究

Robust observer-based deep reinforcement learning for attitude stabilization of vertical takeoff and landing vehicle

李彦铃 ¹罗飞舟 ²葛致磊¹

扫码查看

作者信息

1. 西北工业大学航天学院,陕西西安 710072
2. 中国运载火箭技术研究院,北京 100076
折叠

摘要

针对考虑弹性振动、模型不确定干扰下的垂直起降运载器姿态稳定问题,将鲁棒观测器和深度强化学习中的近端策略优化算法相结合,研究了一种基于鲁棒观测器的近端策略优化(robust observer-based proximal policy optimization,ROB-PPO)方法.该方法设计鲁棒观测器重构受弹性振动干扰的运载器姿态信息,将鲁棒观测器与运载器动力学模型组成环境,将鲁棒观测器得到的重构姿态作为深度强化学习算法的状态,使得深度强化学习智能体与之不断交互,从而训练智能体控制运载器姿态稳定.仿真结果表明,所研究的ROB-PPO算法相较于目前常用的自适应模糊比例-积分-微分(proportional-integral-derivative,PID)算法鲁棒性更强,收敛速度更快.最后,在自主研制的垂直起降运载器上验证了所提出算法有效性.

Abstract

A robust observer-based proximal policy optimization(ROB-PPO)control method,which combines a robust observer and a proximal policy optimization in the deep reinforcement learning algorithm,is studied for the attitude stabilization problem of vertical takeoff and landing vehicles under the consideration of elastic vibration and model uncertainty disturbance.The method designs the robust observer to reconstruct the carrier attitude information disturbed by elastic vibration,composes the environment of the robust observer and the carrier dynamics model,and takes the reconstructed attitude obtained by the robust observer as the state of the deep reinforcement learning algorithm,so that the deep reinforcement learning intelligent body continuously interacts with it,thus training the intelligent body to control the carrier attitude stabilization.The simulation results show that the studied ROB-PPO algorithm is more robust and converges faster than the adaptive fuzzy proportional-integral-derivative(PID)algorithm commonly used today.Finally,the effectiveness of the proposed algorithm is verified on a self-developed vertical takeoff and landing vehicle.

关键词

垂直起降运载器/姿态控制/鲁棒观测器/深度强化学习

Key words

vertical takeoff and landing vehicle/attitude control/robust observer/deep reinforcement learning

引用本文复制引用

出版年

2024

系统工程与电子技术

中国航天科工防御技术研究院中国宇航学会中国系统工程学会

系统工程与电子技术

CSTPCD北大核心

影响因子：0.847

ISSN：1001-506X

参考文献量34

段落导航