科技管理研究2025,Vol.45Issue(7) :194-206.DOI:10.3969/j.issn.1000-7695.2025.7.020

基于深度强化学习的多无人机协同配送路径组合优化研究

Research on Optimization of Multi-UAV Collaborative Delivery Route Combination Based on Deep Reinforcement Learning

孔繁辉 姜斌
科技管理研究2025,Vol.45Issue(7) :194-206.DOI:10.3969/j.issn.1000-7695.2025.7.020

基于深度强化学习的多无人机协同配送路径组合优化研究

Research on Optimization of Multi-UAV Collaborative Delivery Route Combination Based on Deep Reinforcement Learning

孔繁辉 1姜斌2
扫码查看

作者信息

  • 1. 青岛理工大学管理工程学院,山东青岛 266520
  • 2. 中国石油大学(华东)海洋与空间信息学院,山东青岛 257061
  • 折叠

摘要

无人机配送是合理优化"最后一公里"物流配送问题的重要探索.通过聚焦无人机物流配送路径优化这个前沿问题,引入深度强化学习智能算法,对如何实现多架无人机协同配送模式下路径组合进行优化决策.与传统的精确算法和启发式算法不同,深度强化学习算法在充分考虑无人机物流配送特征,尤其在分析非线性能源消费对配送潜能作用机理基础上,构建多约束下混合整数规划模型,通过指针网络(Ptr-Net)模型训练多层自更新的生成前馈网络,从而优化多架无人机服务序列组合决策顺序.研究结果表明,深度强化学习方法具有比传统算法更高的优化效率,此外,模型解码端的注意力机制强化了输入与输出元素间的权重联系,提高了训练数据的特征收敛速度.该问题的解决可拓展物流配送模式与路径优化理论,进一步推动无人机在物流配送领域的应用范围.

Abstract

Unmanned aerial vehicle(UAV)delivery is an important exploration for reasonably optimizing the"last mile"logistics distribution problem.By focusing on the cutting-edge problem of optimizing drone logistics delivery routes,deep reinforcement learning intelligent algorithms are introduced to make optimization decisions on how to achieve path combination in the collaborative delivery mode of multiple drones.Unlike traditional exact algorithms and heuristic algorithms,the deep reinforcement learning algorithm fully considers the characteristics of UAV logistics distribution,especially by analyzing the mechanism of nonlinear energy consumption on delivery potential.A mixed-integer programming model under multiple constraints is constructed,and a multi-layer self-updating generative feedforward network is trained through the Pointer Network(Ptr-Net)model to optimize the decision sequence of service combinations for multiple UAVs.The research results show that the deep reinforcement learning method has higher optimization efficiency than traditional algorithms.Additionally,the attention mechanism in the model's decoding end strengthens the weight connections between input and output elements,improving the convergence speed of training data features.The solution to this problem can expand logistics distribution modes and route optimization theories,further promoting the application scope of UAVs in the logistics distribution field.

关键词

多无人机路径优化/协同配送/深度强化学习/指针网络模型/注意力机制

Key words

multi-UAV route optimization/collaborative delivery/deep reinforcement learning/Pointer Network model/attention mechanism

引用本文复制引用

出版年

2025
科技管理研究
广东省科学学与科技管理研究会

科技管理研究

影响因子:0.779
ISSN:1000-7695
段落导航相关论文