首页|基于多智能体近端策略优化的多信道动态频谱接入

基于多智能体近端策略优化的多信道动态频谱接入

扫码查看
为了在多用户多信道通信场景中应用动态频谱接入(Dynamic Spectrum Access,DSA)技术提高通信效率,保证用户公平,本文基于多智能体近端策略优化(Multi-Agent Proximal Policy Optimization,MAPPO)提出了MAPPO-DSA算法.该算法首先针对单信道接入在多个信道同时空闲时存在的频谱浪费问题,使用多信道接入作为解决方案.同时,多信道接入导致状态空间与动作空间指数增长,计算成本高,学习难度大.为此本文引入MAPPO深度强化学习(Deep Reinforcement Learning,DRL)算法,在复杂环境中高效学习和优化接入策略.通过设计优化MAPPO中观测及奖励等强化学习要素和共享网络参数来保证用户公平.最后,在不同场景下的实验结果表明,所提出的MAPPO-DSA能够学习到近似最优的接入策略,部分场景中的网络吞吐量逼近理论上限,显著优于现有算法,且有效保证用户公平.
Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization
To enhance communication efficiency and ensure user fairness in multi-user multi-channel communication scenarios,based on multi-agent proximal policy optimization (MAPPO) for the application of dynamic spectrum access (DSA) technology,this paper proposes the MAPPO-DSA algorithm. The algorithm addresses the issue of spectrum waste in single-channel access when multiple channels are simultaneously idle by using multi-channel access as a solution. However,multi-channel access leads to an exponential increase in the state and action spaces,resulting in high computational costs and learning difficulties. To tackle this,the paper introduces the MAPPO deep reinforcement learning (DRL) algorithm to efficiently learn and optimize access strategies in complex environments. The design of MAPPO incorporates reinforcement learning elements such as observation and reward,as well as shared network parameters to ensure user fairness. Experimen-tal results in different scenarios demonstrate that the proposed MAPPO-DSA algorithm can learn near-optimal access strate-gies,and approach the theoretical throughput limit in some scenarios,outperforming the existing algorithms significantly and effectively ensuring user fairness.

dynamic spectrum accessdeep reinforcement learningmulti-agent policy optimizationmulti-channel access

陈平平、张旭、谢肇鹏、丘毓萍、方毅

展开 >

福州大学先进制造学院,福建晋江 362251

福州大学物理与信息工程学院,福建福州 350108

广东工业大学信息工程学院,广东广州 510006

动态频谱接入 深度强化学习 多智能体近端优化 多信道接入

国家自然科学基金国家自然科学基金国家自然科学基金福建省自然科学基金

6217113562322106620711312022J06010

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(6)
  • 5