首页|基于深度强化学习的动态频谱智能干扰算法研究

基于深度强化学习的动态频谱智能干扰算法研究

扫码查看
随着人工智能技术的不断发展,强化学习技术在提高电磁频谱控制和干扰对抗效率方面展现出巨大潜力.针对跳频通信系统抗干扰能力强、传统干扰方法效果不佳的问题,利用深度强化学习技术实现在动态频谱环境下的电磁智能干扰.首先引入部分可观测马尔可夫决策过程对干扰机与跳频通信用户之间的通信对抗过程进行建模,然后基于卷积神经网络和长短期记忆网络设计了一个具有频谱特征挖掘和记忆回溯功能的干扰决策网络,实现基于深度强化学习的动态频谱智能干扰(Dynamic Spectrum Intelligent Jamming,DSIJ)算法.仿真实验结果表明,相较于传统DQN算法,所提DSIJ算法的干扰成功率提升了约18%.与传统的扫频干扰方法相比,其干扰成功率更是提升了约68%,从而充分证明了所提出的算法在动态频谱环境下实现智能干扰策略的有效性与显著优势.
A dynamic spectrum intelligent jamming algorithm based on deep reinforcement learning
With the advancement of artificial intelligence technology,reinforcement learning has shown great potential in enhancing electromagnetic spectrum control and jamming efficiency.Given the robust anti-jamming capability of frequency-hopping communication systems and the inadequacy of traditional jamming methods,this paper intends to utilize deep reinforcement learning(DRL)for intelligent electromagnetic jamming in dynamic spectrum environments.First,a partially observable Markov decision process(POMDP)is introduced to model the communication counteraction process between jammers and frequency-hopping communication users.Second,a jamming decision network capable of mining spectrum features and performing memory backtracking is designed,based on convolutional neural networks(CNNs)and long short-term memory networks(LSTMs).This network implements a dynamic spectrum intelligent jamming(DSIJ)algorithm grounded in deep reinforcement learning.Simulation results indicate that compared to the traditional deep Q network(DQN)algorithm,the proposed DSIJ algorithm increases the jamming success rate by approximately 18%;compared to traditional sweeping jamming methods,the success rate is further increased by about 68%.These demonstrate that the proposed algorithm holds effectiveness and significant advantages in implementing intelligent jamming strategies in dynamic spectrum environments.

deep reinforcement learning(DRL)frequency-hopping communicationintelligent jamming decisionpartially observable Markov decision processes(POMDP)

张兰、张彪、梁天一、朱辉杰

展开 >

电磁空间安全全国重点实验室,浙江嘉兴 314033

中国电子科技集团第36研究所,浙江嘉兴 314033

深度强化学习 跳频通信 智能干扰决策 部分可观测马尔可夫决策过程

2024

南京邮电大学学报(自然科学版)
南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.486
ISSN:1673-5439
年,卷(期):2024.44(6)