面向无人机辅助WSN的改进DDPG算法
An improved DDPG algorithm for UAV-assisted WSN
孙爱晶 1魏德 2孙驰2
作者信息
- 1. 西安邮电大学通信与信息工程学院,陕西西安 710121;陕西省信息通信网络及安全重点实验室,陕西西安 710121
- 2. 西安邮电大学通信与信息工程学院,陕西西安 710121
- 折叠
摘要
为了减小无人机辅助无线传感器网络(Unmanned Aerial Vehicle Assisted Wireless Sensor Network,UAV-WSN)数据收集的信息新鲜度(the Age of Information,AoI),提出一种改进的深度确定性策略梯度(Deep Determinis-tic Policy Gradient,DDPG)算法.构建最小AoI的马尔可夫决策过程(Markov Decision Process,MDP)模型,通过经验回放矩阵和双层网络结构提高算法的收敛速度.将玻尔兹曼策略引入搜索策略中,解决UAV-WSN系统在选择最优动作时局部最优的问题,采用多层长短期记忆神经网络模型,以控制经验池中信息的记忆和遗忘程度,避免算法训练时回合间相互影响.将所提算法与演员-评论家(Actor-Critic,AC)算法、深度Q网络(Deep Q-Network,DQN)算法、DDPG算法及random算法对比,结果表明,改进的DDPG算法具有较好的收敛性和稳定性,能够最小化AoI.
Abstract
In order to reduce the age of information(AoI)of data collection in unmanned aerial ve-hicle assisted wireless sensor network(UAV-WSN),an improved deep deterministic policy gradi-ent(DDPG)algorithm is proposed.The Markov decision process(MDP)model with the minimum AoI is constructed.The convergence speed of the algorithm is improved by the experience playback matrix and the two-layer network structure.The Boltzmann strategy is introduced into the search strategy to solve the UAV-WSN system.The problem of local optimum when selecting the optimal action is introduced into the multi-layer long-term and short-term memory neural network model to control the memory and forgetting degree of information in the experience pool,and avoid the mu-tual influence between rounds during algorithm training.The proposed algorithm is compared with the actor-critic(AC)algorithm,the deep Q-network(DQN)algorithm,the DDPG algorithm,and the random algorithm.The results show that the improved DDPG algorithm has better convergence and stability,and can minimize the AoI.
关键词
无人机/无线传感器网络/深度确定性策略梯度/信息新鲜度/玻尔兹曼策略/长短记忆神经网络Key words
unmanned aerial vehicle/wireless sensor network/deep deterministic policy gradient/information freshness/Boltzmann strategy/long and short memory neural network引用本文复制引用
基金项目
国家自然科学基金项目(62271391)
陕西省教育厅服务地方专项科研项目(21JC032)
出版年
2024