首页|具有自适应贪婪因子的深度强化学习路径规划

具有自适应贪婪因子的深度强化学习路径规划

扫码查看
深度强化学习的开创性(Deep Q-Network,DQN)算法,虽然其在路径规划中表现优异,但仍存在过估值、经验回放机制缺陷以及没有很好地平衡探索与利用的关系等问题。为了解决上述问题,提出了一种具有自适应贪婪因子的深度强化学习路径规划算法。首先,在D3QN算法的基础上引入了优先经验回放机制,解决过估值问题的同时增加了算法对重要样本的抽样概率,提高了算法的效率;其次,设计了一种新的奖励函数,提高了动作的区分度;最后,设计了一种能够自适应调节的贪婪因子,平衡了探索与利用的关系。采用Python中的TensorFlow框架和Tkinter库建立环境地图,验证算法的有效性。结果表明,改进的算法无论是得到的最优路径还是算法迭代次数都优于DQN算法。
Path Planning for Deep Reinforcement Learning with Adaptive Greedy Factor
Although DQN(Deep Q-Network),a pioneering algorithm of deep reinforcement learning,has excellent performance in path planning,it still has some problems such as valuation,experience replay mechanism de-fects,and lack of a good balance between exploration and utilization.A deep reinforcement learning path planning al-gorithm with an adaptive greed factor is proposed.Firstly,the preferential experience replay mechanism is introduced on the basis of the D3QN algorithm,which solves the estimation problem and increases the sampling probability of im-portant samples,thus improving the efficiency of the algorithm.Secondly,a new reward function is designed to improve the differentiation of actions.Finally,an adaptive greed factor is designed to balance the relationship between explora-tion and utilization.The TensorFlow framework in Python and Tkinter library are used to establish the environment map to verify the effectiveness of the algorithm.The results show that the improved algorithm is superior to the DQN algorithm in both the optimal path and the number of algorithm iterations.

Path planningDeep reinforcement learningExploration factorExperience in playback

曾明如、涂佳昊、祝琴、宋世杰

展开 >

南昌大学信息工程学院,江西 南昌 330031

南昌大学信息管理学院,江西 南昌 330031

路径规划 深度强化学习 探索因子 经验回放

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(9)
  • 6