首页|策略梯度的超启发算法求解带容量约束车辆路径问题

策略梯度的超启发算法求解带容量约束车辆路径问题

扫码查看
有容量车辆路径问题是组合优化问题中比较热门的问题,它属于经典的NP-hard问题并且时间复杂度高。本文提出了一种基于策略梯度的超启发算法,将强化学习中的确定性策略梯度算法引入到超启发算法的高层策略中的底层算法选择策略,确定性策略梯度算法采用Actor-Critic框架,另外为了能够在后续计算和神经网络参数更新中引用历史经验数据,在确定性策略梯度算法中设计了经验池用于存储状态转移数据。在超启发算法解的接受准则方面,文中通过实验对比了3种接受准则的效果,最终选择了自适应接受准则作为高层策略中解的接受准则。通过对有容量车辆路径问题标准算例的计算,并将求解结果与其他算法对比,验证了所提算法在该问题求解上的有效性和稳定性。
Hyper-heuristic for the capacitated vehicle routing problem with policy gradient
The capacitated vehicle routing problem is popular in combinatorial optimization.It is a classic NP-hard problem with high time complexity.This paper proposed a hyper-heuristic algorithm based on policy gradient.The deter-ministic policy gradient algorithm in reinforcement learning is introduced into the low-level algorithm selection strategy in the high-level heuristic strategy of the hyper-heuristic algorithm.The deterministic policy gradient algorithm adopts the Actor-Critic framework.In addition,to reference historical experience data in subsequent calculations and parameter up-dates of neural networks,the experience pool is designed to store state transition data in a deterministic policy gradient algorithm.In terms of the acceptance criteria of the hyper-heuristic algorithm,the paper compared the effects of the three acceptance criteria through experiments,and finally,the adaptive acceptance criterion is chosen as the acceptance criteri-on in the high-level heuristic strategy.The effectiveness and stability of the proposed algorithm in solving the capacitated vehicle routing problem are verified by calculating the standard example and comparing with the results of other algorithms.

vehicle routing problemreinforcement learningpolicy gradientneural networkshyper-heuristic

张景玲、孙钰粟、赵燕伟、余孟凡、蒋玉勇

展开 >

浙江工业大学特种装备制造及先进加工技术教育部重点实验室,浙江杭州 310014

车辆路径问题 强化学习 关策略梯度算法 神经网络 超启发算法

国家自然科学基金项目浙江省自然科学基金项目

61402409LY19F030017

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(6)