中国科学:信息科学(英文版)2024,Vol.67Issue(3) :221-241.DOI:10.1007/s11432-023-3906-3

Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

Xuanhan ZHOU Jun XIONG Haitao ZHAO Xiaoran LIU Baoquan REN Xiaochen ZHANG Jibo WEI Hao YIN
中国科学:信息科学(英文版)2024,Vol.67Issue(3) :221-241.DOI:10.1007/s11432-023-3906-3

Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

Xuanhan ZHOU 1Jun XIONG 1Haitao ZHAO 1Xiaoran LIU 1Baoquan REN 2Xiaochen ZHANG 1Jibo WEI 1Hao YIN2
扫码查看

作者信息

  • 1. College of Electronic Science and Technology,National University of Defense Technology,Changsha 410073,China
  • 2. Systems Engineering Institute,Academy of Military Sciences PLA,Beijing 100091,China
  • 折叠

Abstract

Unmanned aerial vehicles(UAVs)are recognized as effective means for delivering emergency communication services when terrestrial infrastructures are unavailable.This paper investigates a multi-UAV-assisted communication system,where we jointly optimize UAVs'trajectories,user association,and ground users(GUs)'transmit power to maximize a defined fairness-weighted throughput metric.Owing to the dynamic nature of UAVs,this problem has to be solved in real time.However,the problem's non-convex and combinatorial attributes pose challenges for conventional optimization-based algorithms,particularly in scenarios without central controllers.To address this issue,we propose a multi-agent deep reinforcement learning(MADRL)approach to provide distributed and online solutions.In contrast to previous MADRL-based methods considering only UAV agents,we model UAVs and GUs as heterogeneous agents sharing a common objective.Specifically,UAVs are tasked with optimizing their trajectories,while GUs are responsible for selecting a UAV for association and determining a transmit power level.To learn policies for these heterogeneous agents,we design a heterogeneous coordinated QMIX(HC-QMIX)algorithm to train local Q-networks in a centralized manner.With these well-trained local Q-networks,UAVs and GUs can make individual decisions based on their local observations.Extensive simulation results demonstrate that the proposed algorithm outperforms state-of-the-art benchmarks in terms of total throughput and system fairness.

Key words

unmanned aerial vehicle(UAV)/trajectory design/resource allocation/multi-agent deep rein-forcement learning(MADRL)/heterogeneous agents

引用本文复制引用

基金项目

国家自然科学基金(62371462)

国家自然科学基金(61931020)

国家自然科学基金(62101569)

国家自然科学基金(U19B2024)

湖南省自然科学基金(2022J J10068)

Science and Technology Innovation Program of Hunan Province(2022RC1093)

出版年

2024
中国科学:信息科学(英文版)
中国科学院

中国科学:信息科学(英文版)

CSTPCDEI
影响因子:0.715
ISSN:1674-733X
参考文献量55
段落导航相关论文