Unmanned aerial vehicle(UAV)delivery is an important exploration for reasonably optimizing the"last mile"logistics distribution problem.By focusing on the cutting-edge problem of optimizing drone logistics delivery routes,deep reinforcement learning intelligent algorithms are introduced to make optimization decisions on how to achieve path combination in the collaborative delivery mode of multiple drones.Unlike traditional exact algorithms and heuristic algorithms,the deep reinforcement learning algorithm fully considers the characteristics of UAV logistics distribution,especially by analyzing the mechanism of nonlinear energy consumption on delivery potential.A mixed-integer programming model under multiple constraints is constructed,and a multi-layer self-updating generative feedforward network is trained through the Pointer Network(Ptr-Net)model to optimize the decision sequence of service combinations for multiple UAVs.The research results show that the deep reinforcement learning method has higher optimization efficiency than traditional algorithms.Additionally,the attention mechanism in the model's decoding end strengthens the weight connections between input and output elements,improving the convergence speed of training data features.The solution to this problem can expand logistics distribution modes and route optimization theories,further promoting the application scope of UAVs in the logistics distribution field.