首页|面向无人机辅助WSN的改进DDPG算法

面向无人机辅助WSN的改进DDPG算法

An improved DDPG algorithm for UAV-assisted WSN

扫码查看
为了减小无人机辅助无线传感器网络(Unmanned Aerial Vehicle Assisted Wireless Sensor Network,UAV-WSN)数据收集的信息新鲜度(the Age of Information,AoI),提出一种改进的深度确定性策略梯度(Deep Determinis-tic Policy Gradient,DDPG)算法.构建最小AoI的马尔可夫决策过程(Markov Decision Process,MDP)模型,通过经验回放矩阵和双层网络结构提高算法的收敛速度.将玻尔兹曼策略引入搜索策略中,解决UAV-WSN系统在选择最优动作时局部最优的问题,采用多层长短期记忆神经网络模型,以控制经验池中信息的记忆和遗忘程度,避免算法训练时回合间相互影响.将所提算法与演员-评论家(Actor-Critic,AC)算法、深度Q网络(Deep Q-Network,DQN)算法、DDPG算法及random算法对比,结果表明,改进的DDPG算法具有较好的收敛性和稳定性,能够最小化AoI.
In order to reduce the age of information(AoI)of data collection in unmanned aerial ve-hicle assisted wireless sensor network(UAV-WSN),an improved deep deterministic policy gradi-ent(DDPG)algorithm is proposed.The Markov decision process(MDP)model with the minimum AoI is constructed.The convergence speed of the algorithm is improved by the experience playback matrix and the two-layer network structure.The Boltzmann strategy is introduced into the search strategy to solve the UAV-WSN system.The problem of local optimum when selecting the optimal action is introduced into the multi-layer long-term and short-term memory neural network model to control the memory and forgetting degree of information in the experience pool,and avoid the mu-tual influence between rounds during algorithm training.The proposed algorithm is compared with the actor-critic(AC)algorithm,the deep Q-network(DQN)algorithm,the DDPG algorithm,and the random algorithm.The results show that the improved DDPG algorithm has better convergence and stability,and can minimize the AoI.

unmanned aerial vehiclewireless sensor networkdeep deterministic policy gradientinformation freshnessBoltzmann strategylong and short memory neural network

孙爱晶、魏德、孙驰

展开 >

西安邮电大学通信与信息工程学院,陕西西安 710121

陕西省信息通信网络及安全重点实验室,陕西西安 710121

无人机 无线传感器网络 深度确定性策略梯度 信息新鲜度 玻尔兹曼策略 长短记忆神经网络

国家自然科学基金项目陕西省教育厅服务地方专项科研项目

6227139121JC032

2024

西安邮电大学学报
西安邮电学院

西安邮电大学学报

CSTPCD
影响因子:0.795
ISSN:1007-3264
年,卷(期):2024.29(3)