基于DDPG的智能反射面辅助无线携能通信系统性能优化

DDPG-based performance optimization algorithm for IRS-assisted simultaneous wireless information and power transfer systems

罗丽平 ¹潘伟民¹

扫码查看

作者信息

1. 广西民族大学电子信息学院,广西南宁 530006
折叠

摘要

针对智能反射面(IRS,intelligent reflecting surface)辅助的多输入单输出(MISO,multiple input single-output)无线携能通信(SWIPT,simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量.为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG,deep deterministic policy gradient)算法.仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛.这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务.

Abstract

For the intelligent reflecting surface(IRS)-assisted multiple input single output(MISO)simultaneous wireless information and power transfer(SWIPT)system,the beam forming vector at the base station and the reflected beam forming vector of the IRS were jointly optimized,by considering the maximum transmit power of the base station,the unit modulus constraint of the IRS reflection phase shift matrix,and the minimum energy constraint of the energy receiver.The object was to maximize the spectrum efficiency.To solve the non-convex optimization problem,a deep de-terministic policy gradient(DDPG)algorithm based on deep reinforcement learning was proposed.Simulation results show that the average reward of the DDPG algorithm is related to the learning rate.Under the condition of selecting the appropriate learning rate,the DDPG algorithm can obtain an average mutual information similar to that of the traditional optimization algorithm,but the running time is significantly lower than that of the traditional non-convex optimization algorithm.Even if the number of antennas and the number of reflective units are increased,the DDPG algorithm can still converge in a short period of time.This indicates that the DDPG algorithm can effectively improve the computational effi-ciency and is suitable for communication services with high real-time requirements.

关键词

多输入单输出/无线携能通信/智能反射面/波束成形/深度确定性策略梯度

Key words

multiple input single output/simultaneous wireless information and power transfer/intelligent reflecting surface/beam forming/deep deterministic policy gradient

引用本文复制引用

基金项目

广西科技重大专项(AA23073006)

广西民族大学研究生创新计划(gxunchxs2022298)

Guangxi Minzu University Graduate Innovation Program(gxunchxs2022098)

出版年

2024

物联网学报

人民邮电出版社有限公司

物联网学报

CSTPCD

ISSN：2096-3750

参考文献量30

段落导航