基于分层强化学习的机器人自主避障算法仿真

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：智能机器人可以实时感知周围环境信息,通过绘制环境地图控制行动轨迹,但是如何自主导航起点到终点的同时避开障碍物,获取最优路径的问题仍然需要进一步解决.为提高机器人路径规划能力,减少机器人与障碍物的碰撞概率,提出基于分层强化学习算法的机器人自主避障方法.结合机器人的移动速度、角速度等相关信息,建立运动学模型,分别确立局部和全局坐标系.通过坐标转换,采集机器人和障碍物信息,构建分层强化学习整体架构,分为环境信息交互、子任务选择和根任务协作三个层次.将Q学习方法作为强化学习策略,设定Q函数值更新规则.通过笛卡尔乘积形式表示环境状态信息,选取合理的奖赏函数,提高学习效率,通过赋予Q值最大化的方式控制机器人最佳动作,实现自主避障.实验测试结果验证了上述方法能够精准躲避静态和动态障碍物,计算复杂度较低,可避免陷入局部最优.

外文标题：Simulation of Robot Autonomous Obstacle Avoidance Algorithm Based on Hierarchical Reinforcement Learning

外文摘要：The intelligent robot can perceive the environment information in real-time and control the action traj-ectory by drawing the environment map.In order to improve the path planning ability and reduce the collision proba-bility between the robot and obstacles,this paper proposed an algorithm of autonomous obstacle avoidance based on hierarchical reinforcement learning algorithm.Combined with the moving speed and angular speed,a kinematics model was built,and then local and global coordinate systems were established respectively.Through coordinate transforma-tion,the information of the robot and obstacle was collected.Meanwhile,the hierarchical reinforcement learning archi-tecture was constructed,including three levels:environment information interaction,sub-task selection and root task cooperation.After that,the Q-learning method was used as a reinforcement learning strategy,and the rule of updating the Q-function value was determined.Moreover,environment state information was expressed in the form of Cartesian product.And a reasonable reward function was chosen to improve the learning efficiency.Finally,the optimal action of the robot was controlled by maximizing the Q value.Thus,autonomous obstacle avoidance was achieved.Experimental results show that the proposed method can avoid static and dynamic obstacles accurately,with low computational com-plexity,and it also avoids falling into local optimization.

外文关键词：

RobotHierarchical reinforcement learningAutonomous obstacle avoidanceLearning strategiesReward function

作者：

安燕霞、郑晓霞

展开 >

作者单位：

晋中信息学院智能工程学院,山西晋中 030800

太原理工大学航空航天学院,山西太原 030024

关键词：

机器人分层强化学习自主避障学习策略奖赏函数

基金：

山西省"1331工程"建设项目山西省教学改革课题(2021)山西省规划课题(十四五)

项目编号：

J2021952GH-220338

出版年：

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

年,卷(期)：2024.41(4)

参考文献量15