首页|利用A2C-ac的城轨车车通信资源分配算法

利用A2C-ac的城轨车车通信资源分配算法

扫码查看
在城市轨道交通列车控制系统中,车车(T2T)通信作为新一代列车通信模式,利用列车间直接通信来降低通信时延,提高列车运行效率.在T2T通信与车地(T2G)通信并存场景下,针对复用T2G链路产生的干扰问题,在保证用户通信质量的前提下,该文提出一种基于多智能体深度强化学习(MADRL)的改进优势演员-评论家(A2C-ac)资源分配算法.首先以系统吞吐量为优化目标,以T2T通信发送端为智能体,策略网络采用分层输出结构指导智能体选择需复用的频谱资源和功率水平,然后智能体做出相应动作并与T2T通信环境交互,得到该时隙下T2G用户和T2T用户吞吐量,价值网络对两者分别评价,利用权重因子为每个智能体定制化加权时序差分β(TD)误差,以此来灵活优化神经网络参数.最后,智能体根据训练好的模型联合选出最佳的频谱资源和功率水平.仿真结果表明,该算法相较于A2C算法和深度Q网络(DQN)算法,在收敛速度、T2T成功接入率、吞吐量等方面均有明显提升.
Resource Allocation Algorithm of Urban Rail Train-to-Train Communication with A2C-ac
In the train control system of urban rail transit, Train-to-Train (T2T) communication, a new train communication mode, use direct communication between trains to reduce communication delay and improve train operation efficiency. In the scenario of the coexistence of T2T communication and Train to Ground (T2G) communication, an improved Advantage Actor-Critic-ac (A2C-ac) resource allocation algorithm based on Multi-Agent Deep Reinforcement Learning (MADRL) is proposed to solve the interference problem caused by multiplexing T2G links, and under the premise of ensuring the quality of user communication. Firstly, taking the system throughput as the optimization goal and the T2T communication transmitter as the agent, the policy network adopts a hierarchical output structure to guide the agent in selecting the spectrum resources and power level to be reused. Then the agent makes corresponding actions and interacts with the communication environment to obtain the throughput of T2G users and T2T users in the time slot. The value network evaluates the two separately and uses the weight factor to customize the weighted Temporal Difference (TD)βerror for each agent to optimize the neural network parameters flexibly. Finally, the agents jointly select the best spectral resources and power levels according to the trained model. The simulation results show that compared with the A2C and Deep Q-Networks (DQN) algorithms, the proposed algorithm has significantly improved the convergence speed, T2T successful access rate, and the throughput.

Urban rail transit systemResource allocationTrain-to-Train (T2T)Multi-Agent Deep Reinforcement Learning (MADRL)Advantage Actor-Critic-ac (A2C-ac) algorithm

王瑞峰、张明、黄子恒、何涛

展开 >

兰州交通大学自动化与电气工程学院 兰州 730070

兰州交通大学自动控制研究所 兰州 730070

城市轨道交通 资源分配 T2T通信 多智能体深度强化学习 A2C-ac算法

国家自然科学基金铁路基础研究联合基金

U2268206

2024

电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心
影响因子:1.302
ISSN:1009-5896
年,卷(期):2024.46(4)