西安邮电大学学报2024,Vol.29Issue(3) :1-11.DOI:10.13682/j.issn.2095-6533.2024.03.001

面向无人机辅助WSN的改进DDPG算法

An improved DDPG algorithm for UAV-assisted WSN

孙爱晶 魏德 孙驰
西安邮电大学学报2024,Vol.29Issue(3) :1-11.DOI:10.13682/j.issn.2095-6533.2024.03.001

面向无人机辅助WSN的改进DDPG算法

An improved DDPG algorithm for UAV-assisted WSN

孙爱晶 1魏德 2孙驰2
扫码查看

作者信息

  • 1. 西安邮电大学通信与信息工程学院,陕西西安 710121;陕西省信息通信网络及安全重点实验室,陕西西安 710121
  • 2. 西安邮电大学通信与信息工程学院,陕西西安 710121
  • 折叠

摘要

为了减小无人机辅助无线传感器网络(Unmanned Aerial Vehicle Assisted Wireless Sensor Network,UAV-WSN)数据收集的信息新鲜度(the Age of Information,AoI),提出一种改进的深度确定性策略梯度(Deep Determinis-tic Policy Gradient,DDPG)算法.构建最小AoI的马尔可夫决策过程(Markov Decision Process,MDP)模型,通过经验回放矩阵和双层网络结构提高算法的收敛速度.将玻尔兹曼策略引入搜索策略中,解决UAV-WSN系统在选择最优动作时局部最优的问题,采用多层长短期记忆神经网络模型,以控制经验池中信息的记忆和遗忘程度,避免算法训练时回合间相互影响.将所提算法与演员-评论家(Actor-Critic,AC)算法、深度Q网络(Deep Q-Network,DQN)算法、DDPG算法及random算法对比,结果表明,改进的DDPG算法具有较好的收敛性和稳定性,能够最小化AoI.

Abstract

In order to reduce the age of information(AoI)of data collection in unmanned aerial ve-hicle assisted wireless sensor network(UAV-WSN),an improved deep deterministic policy gradi-ent(DDPG)algorithm is proposed.The Markov decision process(MDP)model with the minimum AoI is constructed.The convergence speed of the algorithm is improved by the experience playback matrix and the two-layer network structure.The Boltzmann strategy is introduced into the search strategy to solve the UAV-WSN system.The problem of local optimum when selecting the optimal action is introduced into the multi-layer long-term and short-term memory neural network model to control the memory and forgetting degree of information in the experience pool,and avoid the mu-tual influence between rounds during algorithm training.The proposed algorithm is compared with the actor-critic(AC)algorithm,the deep Q-network(DQN)algorithm,the DDPG algorithm,and the random algorithm.The results show that the improved DDPG algorithm has better convergence and stability,and can minimize the AoI.

关键词

无人机/无线传感器网络/深度确定性策略梯度/信息新鲜度/玻尔兹曼策略/长短记忆神经网络

Key words

unmanned aerial vehicle/wireless sensor network/deep deterministic policy gradient/information freshness/Boltzmann strategy/long and short memory neural network

引用本文复制引用

基金项目

国家自然科学基金项目(62271391)

陕西省教育厅服务地方专项科研项目(21JC032)

出版年

2024
西安邮电大学学报
西安邮电学院

西安邮电大学学报

CSTPCD
影响因子:0.795
ISSN:1007-3264
段落导航相关论文