Hyper-heuristic for the capacitated vehicle routing problem with policy gradient
The capacitated vehicle routing problem is popular in combinatorial optimization.It is a classic NP-hard problem with high time complexity.This paper proposed a hyper-heuristic algorithm based on policy gradient.The deter-ministic policy gradient algorithm in reinforcement learning is introduced into the low-level algorithm selection strategy in the high-level heuristic strategy of the hyper-heuristic algorithm.The deterministic policy gradient algorithm adopts the Actor-Critic framework.In addition,to reference historical experience data in subsequent calculations and parameter up-dates of neural networks,the experience pool is designed to store state transition data in a deterministic policy gradient algorithm.In terms of the acceptance criteria of the hyper-heuristic algorithm,the paper compared the effects of the three acceptance criteria through experiments,and finally,the adaptive acceptance criterion is chosen as the acceptance criteri-on in the high-level heuristic strategy.The effectiveness and stability of the proposed algorithm in solving the capacitated vehicle routing problem are verified by calculating the standard example and comparing with the results of other algorithms.