首页|基于多步信息辅助的Q-learning路径规划算法

基于多步信息辅助的Q-learning路径规划算法

扫码查看
为提升静态环境下移动机器人路径规划能力,解决传统Q-learning算法在路径规划中收敛速度慢的问题,提出一种基于多步信息辅助机制的Q-learning改进算法.利用ε-greedy策略中贪婪动作的多步信息与历史最优路径长度更新资格迹,使有效的资格迹在算法迭代中持续发挥作用,用保存的多步信息解决可能落入的循环陷阱;使用局部多花朵的花授粉算法初始化Q值表,提升机器人前期搜索效率;基于机器人不同探索阶段的目的,结合送代路径长度的标准差与机器人成功到达目标点的次数设计动作选择策略,以增强算法对环境信息探索与利用的平衡能力.实验结果表明:该算法具有较快的收敛速度,验证了算法的可行性与有效性.
Multi-step Information Aided Q-learning Path Planning Algorithm
To improve the path planning capability of mobile robots in a static environment and solve the problem of slow convergence of the traditional Q-learning algorithm in path planning,this paper proposes a multi-step information-aided Q-learning improvement algorithm.Using the multi-step information of greedy action in ε-greedy strategy and length of the historical optimal path to update the eligibility traces,which makes the effective eligibility traces work continuously in the iteration of the algorithm and solves the loop traps that may fall into with the preserved multi-step information;using the local multi-flower pollination algorithm to initialize the Q-value table to improve the robot's pre-search efficiency;based on the purpose of different exploration stages of the robot,the action selection strategy is designed by combining the standard deviation of the iterative path length with the number of times the robot successfully reaches the target point to enhance the algorithm's ability to balance the exploration and exploitation of environmental information.The experimental results prove that the algorithm proposed in this paper has a fast convergence speed,which verifies the feasibility and effectiveness of the algorithm.

path planningQ-learningconvergence speedaction selection strategygrid map

王越龙、王松艳、晁涛

展开 >

哈尔滨工业大学控制与仿真中心,黑龙江哈尔滨 150001

路径规划 Q-learning 收敛速度 动作选择策略 栅格地图

2024

系统仿真学报
北京仿真中心 中国系统仿真学会

系统仿真学报

CSTPCD北大核心
影响因子:0.551
ISSN:1004-731X
年,卷(期):2024.36(9)