首页|基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题

基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题

扫码查看
覆盖旅行商问题(covering salesman problem,CSP)是旅行商问题的变体,在防灾规划、急救管理中有着广泛应用。由于传统方法求解问题实例耗时严重,近年来深度神经网络被提出用于解决该类组合优化问题,在求解速度和泛化性上有明显的优势。现有基于深度神经网络求解CSP的方法求解质量较低,特别在大规模实例上与传统的启发式方法相比存在较大差距。针对上述问题,提出一种新的基于深度强化学习求解CSP的方法,由编码器对输入特征进行编码,提出新的Mask策略对解码器使用自注意力机制构造解的过程进行约束,并提出多起点策略改善训练过程、提高求解质量。实验结果表明,所提方法对比现有基于深度神经网络的求解方法进一步缩小了最优间隙,同时有着更高的样本效率,在不同规模和不同覆盖类型的CSP中展现出更强的泛化能力,与启发式算法相比在求解速度上有10~40倍的提升。
Deep reinforcement learning algorithm based on multi-start and Mask strategy for solving the covering salesman problem
The covering salesman problem(CSP)is a variant of the traveling salesman problem,which is widely used in disaster planning and emergency management.Since the traditional solvers are time-consuming for solving problem instances,deep neural networks have been proposed for solving this type of combinatorial optimization problem in recent years,which have obvious advantages in terms of solving speed and generalization.However,existing methods based on deep neural networks for solving CSP problems have low solution quality,especially in large-scale instances,compared with traditional heuristics.Therefore,we propose a new method based on deep reinforcement learning to solve the CSP problem,in which the encoder encodes the input features,propose a new Mask strategy to constrain the decoder to construct a solution using the self-attention mechanism,and propose a multi-start strategy to improve the training process and the solution quality.Experimental results show that the proposed method further reduces the optimality gap compared with existing deep neural network-based solution methods,has higher sample efficiency,shows stronger generalization ability in CSP tasks of different sizes and coverage types,and has a 10-40 times improvement in solution speed compared with heuristic algorithms.

covering salesman problemdeep reinforcement learningcombinatorial optimizationmulti-startMask strategy

方伟、接中冰、陆恒杨、张涛

展开 >

江南大学江苏省人工智能国际合作联合实验室,江苏无锡 214122

江南大学江苏省模式识别与计算智能工程实验室,江苏无锡 214122

中国船舶科学研究中心,江苏无锡 214082

覆盖旅行商 深度强化学习 组合优化 多起点 Mask策略

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目船舶总体性能创新研究开放基金项目

6207315562002137621060886220611322422213

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(4)
  • 15