首页|基于多智能体深度强化学习的多无人机辅助移动边缘计算轨迹设计

基于多智能体深度强化学习的多无人机辅助移动边缘计算轨迹设计

扫码查看
无人机(Unmanned Aerial Vehicle,UAV)辅助的移动边缘计算(Mobile Edge Computing,MEC)网络能够为地面用户设备(User Equipment,UE)提供优质的计算服务,但是为多无人机进行实时的轨迹设计仍是一个挑战.针对该问题,提出基于多智能体深度强化学习的轨迹设计算法,利用多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient,MADDPG)框架对无人机的轨迹进行协作设计.考虑到无人机有限的电池容量是限制无人机网络性能的重要因素,因此以无人机的能量效率之和为优化目标构建优化问题,联合优化无人机集群的轨迹和用户设备的卸载决策.每个智能体与边缘计算网络环境进行交互并观测自己的局部状态,通过Actor网络得到轨迹坐标,联合其他智能体的动作和观测训练Critic网络,从而改善Actor网络输出的轨迹策略.仿真结果表明:基于MADDPG的无人机轨迹设计算法具有良好的收敛性和鲁棒性,能够高效地提升无人机的能量效率;所提算法性能较随机飞行算法最高可提升120%,较圆周飞行算法最高可提升20%,较深度确定性策略梯度算法可提升5%~10%.
Trajectory design for multi-UAV-assisted mobile edge computing based on multi-agent deep reinforcement learning
Unmanned Aerial Vehicle(UAV)-assisted Mobile Edge Computing(MEC)networks can provide high-quality computational services to ground User Equipment(UE),but real-time trajectory design for multiple UAVs remains a significant challenge.To address this issue,a trajectory design al-gorithm based on multi-agent deep reinforcement learning is proposed,utilizing the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)framework to collaboratively design UAV trajectories.Considering the limited battery capacity of UAVs,a critical constraint on UAV network performance,the optimization problem is formulated to improve the sum of UAV energy efficiencies.This involves jointly optimizing the trajectories of UAV clusters and the offloading decisions of UEs.Each agent in-teracts with the edge computing network environment,observes its local state,and determines trajec-tory coordinates via an Actor network.The Critic network is trained by incorporating the action and ob-servation of other agents,thereby refining the trajectory policy generated by the Actor network.Simu-lation results demonstrate that the MADDPG-based trajectory design algorithm exhibits excellent con-vergence and robustness,significantly enhancing UAV energy efficiency.Specifically,the proposed al-gorithm outperforms the random flight algorithm by 120%at most,the circular flight algorithm by 20%at most,and the Deep Deterministic Policy Gradient(DDPG)algorithm by 5%to 10%.

UAV trajectory designMobile Edge Computing(MEC)reinforcement learningMulti-Agent Deep Deterministic Policy Gradient(MADDPG)

徐少毅、杨磊

展开 >

北京交通大学 电子信息工程学院,北京 100044

无人机轨迹设计 移动边缘计算 强化学习 多智能体深度确定性策略梯度

2024

北京交通大学学报
北京交通大学

北京交通大学学报

CSTPCD北大核心
影响因子:0.525
ISSN:1673-0291
年,卷(期):2024.48(5)