首页|演化算法的DQN网络参数优化方法

演化算法的DQN网络参数优化方法

扫码查看
为了解决DQN(Deep Q Network)在早期会出现盲目搜索、勘探利用不均并导致整个算法收敛过慢的问题,从探索前期有利于算法训练的有效信息获取与利用的角度出发,以差分演化(Differential Evolution)算法为例,提出了一种基于演化算法优化DQN网络参数以加快其收敛速度的方法(DE-DQN).首先,将DQN的网络参数编码为演化个体;其次,分别采用"运行步长"和"平均回报"两种适应度函数评价方式;利用CartPole控制问题进行仿真对比,验证了两种评价方式的有效性.最后,实验结果表明,在智能体训练5 000代时所提出的改进算法,以"运行步长"为适应度函数时,在运行步长、平均回报和累计回报上分别提高了 82.7%,18.1%和25.1%,并优于改进DQN算法;以"平均回报"为适应度函数时,在运行步长、平均回报和累计回报上分别提高了 74.9%,18.5%和13.3%并优于改进DQN算法.这说明了 DE-DQN算法相较于传统的DQN及其改进算法前期能获得更多有用信息,加快收敛速度.
Method for Optimizing Parameters of Deep Q Network based on Evolutionary Algorithms
The study aims to address the issues of blind search,uneven exploration-exploitation and slow convergence in the early stages of DQN(Deep Q Network).From the perspective of effective information acquisition and utilization beneficial for algorithm training and with Differential Evolution(DE)algorithm as an example,the paper presents a method named DE-DQN for optimizing the parameters of the DQN network based on evolutionary algorithms,aiming to accelerate its convergence speed.Firstly,the network parameters of DQN are encoded as evolutionary individuals.Secondly,two fitness evaluation metrics,"run length"and"average return"are employed separately.The effectiveness of the two evaluation methods is verified through simulation comparisons using the CartPole control problem.Finally,the experimental results indicate that training for 5000 generations,the proposed algorithm increases by 82.7%,18.1%,and 25.1%in run length,average return,and cumulative return,respectively,when"run length"is used as the fitness function and by 74.9%,18.5%,and 13.3%in run length,average return,and cumulative return,respectively,when"average return"is used as the fitness function,outperforming the improved DQN algorithm.It is concluded that compared to traditional DQN and its improved algorithms,the DE-DQN algorithm can acquire more useful information in the early stages and,therefore,accelerate the convergence speed.

deep reinforcement learningdeep Q networkconvergence accelerationevolution algorithmsautomatic control

曹子建、郭瑞麒、贾浩文、李骁、徐恺

展开 >

西安工业大学计算机科学与工程学院,西安 710021

深度强化学习 深度Q网络 收敛加速 演化算法 自动控制

陕西省自然科学基础研究计划

2020JM-565

2024

西安工业大学学报
西安工业大学

西安工业大学学报

CSTPCDCHSSCD
影响因子:0.381
ISSN:1673-9965
年,卷(期):2024.44(2)
  • 31