演化算法的DQN网络参数优化方法

Method for Optimizing Parameters of Deep Q Network based on Evolutionary Algorithms

扫码查看

原文链接

维普
万方数据

中文摘要：为了解决DQN(Deep Q Network)在早期会出现盲目搜索、勘探利用不均并导致整个算法收敛过慢的问题,从探索前期有利于算法训练的有效信息获取与利用的角度出发,以差分演化(Differential Evolution)算法为例,提出了一种基于演化算法优化DQN网络参数以加快其收敛速度的方法(DE-DQN).首先,将DQN的网络参数编码为演化个体;其次,分别采用"运行步长"和"平均回报"两种适应度函数评价方式;利用CartPole控制问题进行仿真对比,验证了两种评价方式的有效性.最后,实验结果表明,在智能体训练5 000代时所提出的改进算法,以"运行步长"为适应度函数时,在运行步长、平均回报和累计回报上分别提高了 82.7％,18.1％和25.1％,并优于改进DQN算法;以"平均回报"为适应度函数时,在运行步长、平均回报和累计回报上分别提高了 74.9％,18.5％和13.3％并优于改进DQN算法.这说明了 DE-DQN算法相较于传统的DQN及其改进算法前期能获得更多有用信息,加快收敛速度.

外文摘要：The study aims to address the issues of blind search,uneven exploration-exploitation and slow convergence in the early stages of DQN(Deep Q Network).From the perspective of effective information acquisition and utilization beneficial for algorithm training and with Differential Evolution(DE)algorithm as an example,the paper presents a method named DE-DQN for optimizing the parameters of the DQN network based on evolutionary algorithms,aiming to accelerate its convergence speed.Firstly,the network parameters of DQN are encoded as evolutionary individuals.Secondly,two fitness evaluation metrics,"run length"and"average return"are employed separately.The effectiveness of the two evaluation methods is verified through simulation comparisons using the CartPole control problem.Finally,the experimental results indicate that training for 5000 generations,the proposed algorithm increases by 82.7％,18.1％,and 25.1％in run length,average return,and cumulative return,respectively,when"run length"is used as the fitness function and by 74.9％,18.5％,and 13.3％in run length,average return,and cumulative return,respectively,when"average return"is used as the fitness function,outperforming the improved DQN algorithm.It is concluded that compared to traditional DQN and its improved algorithms,the DE-DQN algorithm can acquire more useful information in the early stages and,therefore,accelerate the convergence speed.

外文关键词：

deep reinforcement learningdeep Q networkconvergence accelerationevolution algorithmsautomatic control

作者：

曹子建、郭瑞麒、贾浩文、李骁、徐恺

展开 >

作者单位：

西安工业大学计算机科学与工程学院,西安 710021

关键词：

深度强化学习深度Q网络收敛加速演化算法自动控制

基金：

陕西省自然科学基础研究计划

项目编号：

2020JM-565

出版年：

2024

DOI：

10.16185/j.jxatu.edu.cn.2024.02.401

西安工业大学学报

西安工业大学

西安工业大学学报

CSTPCDCHSSCD

影响因子：0.381

ISSN：1673-9965

年,卷(期)：2024.44(2)

参考文献量31