基于优势后见经验回放的强化学习导航方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：目前强化学习在移动机器人领域表现出了强大的潜力,将强化学习算法与机器人导航相结合,不需要依赖先验知识就可以实现移动机器人的自主导航,但是在机器人强化学习过程中存在样本利用率低且泛化能力不强的问题.针对上述问题,在D3QN算法的基础上提出优势后见经验回放算法用于经验样本的回放.首先计算轨迹样本中轨迹点的优势函数值,选择优势函数最大值的点作为目标点,然后对轨迹样本进行重新标记,将新旧轨迹样本一同放入经验池中增加经验样本的多样性,使智能体利用失败的经验样本学习,更高效地实现到目标点的导航.为评估该方法的有效性,基于Gazebo平台搭建不同的实验环境,并采用TurtleBot3机器人在仿真环境下进行导航训练与迁移测试,结果表明,该算法在训练环境下导航成功率高于当前主流算法,在迁移测试环境中导航成功率可达86.33%,能够有效提高导航样本利用率,降低导航策略学习难度,增强移动机器人在不同环境中的自主导航能力和迁移泛化能力.

外文标题：Reinforcement Learning Navigation Method Based on Advantage Hindsight Experience Replay

外文摘要：Reinforcement learning demonstrates significant potential in the field of mobile robots.By combining reinforcement learning algorithms with robot navigation,the autonomous piloting of robots can be achieved without prior knowledge.However,robot reinforcement learning is associated with some disadvantages,such as low sample utilization ratios and poor generalization ability.Hence,based on the D3QN algorithm,this paper proposes an advantage hindsight experience replay algorithm for the playback of experience samples.First,the advantage function value of trajectory points in trajectory samples is calculated,and the point with the maximum advantage function is selected as the target point.Subsequently,the trajectory samples are relabeled,and the old and new trajectory samples are placed simultaneously into the experience pool to increase the diversity of experience samples,thus allowing the agent to learn to navigate to the target point more efficiently by learning the failed experience samples.To assess the validity of the proposed approach,different experimental environments are established using the Gazebo platform,and a TurtleBot3 robot is used to conduct navigation training and transfer tests in the simulation environment.The results show that the navigation success rate in the training environment is higher than that yielded by the current mainstream algorithm,and that the maximum navigation success rate achieved in the transfer test environment is 86.33%.Improving the algorithm can enhance the utilization ratio of navigation samples,reduce the difficulty of learning navigation strategies,and enhance the autonomous navigation ability and migration generalization ability of the robot in different environments.

外文关键词：

reinforcement learningmobile robotshindsight experience replayneural networksample utilization

作者：

王少桐、况立群、韩慧妍、熊风光、薛红新

展开 >

作者单位：

中北大学计算机科学与技术学院,山西太原 030051

关键词：

强化学习移动机器人后见经验回放神经网络样本利用率

基金：

国家自然科学基金山西省回国留学人员科研项目山西省科学成果转化引导专项

项目编号：

621062382020-113202104021301055

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0066193

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(1)

参考文献量6