首页|基于先验知识嵌入LSTM-PPO模型的智能干扰决策算法

基于先验知识嵌入LSTM-PPO模型的智能干扰决策算法

扫码查看
针对基于传统强化学习模型的多功能雷达(MFR)干扰决策算法决策效率及有效性低、策略不稳定的问题,提出基于先验知识嵌入长短期记忆(LSTM)网络-近端策略优化(PPO)模型的智能干扰决策算法.所提算法首先将MFR干扰决策问题定义为马尔可夫决策过程(MDP).其次,基于收益塑造理论将干扰领域先验知识嵌入PPO模型的奖励函数,利用重塑所得奖励函数引导智能体快速收敛从而提升决策效率.而后,基于LSTM优异的时序特征抽取能力,捕捉回波数据的动态特征以有效刻画雷达工作状态.最后,将所抽取动态特征输入PPO模型,经由所嵌入先验知识的引导,从而可快速获得有效干扰决策.仿真实验表明,相较于传统深度干扰决策算法,所提算法具有较高的决策效率以及有效性,且可高效稳健地达成MFR干扰决策算法.
Intelligent interference decision algorithm with prior knowledge embedded LSTM-PPO model
Focusing on the issues of low efficiency and effectiveness in decision-making as well as the instability of tradi-tional reinforcement learning model-based multi-function radar(MFR)jamming decision algorithms,a prior knowledge embedded long short-term memory(LSTM)network-proximal policy optimization(PPO)model based intelligent interfer-ence decision algorithm was developed.Firstly,the MFR interference decision problem was regarded as a Markov deci-sion process(MDP).Furthermore,by incorporating prior knowledge associated with the interference domain into the re-ward function of the PPO model using revenue shaping theory,a reshaped reward function was obtained to guide agent converge quickly so as to improve decision-making efficiency.Besides,leveraging LSTM's excellent temporal feature ex-traction ability enables capturing dynamic characteristics of echo data effectively to describe radar working states.Finally,these extracted dynamic features were inputted into the PPO model.With guidance from embedded prior knowledge,an effective interference decision can be achieved rapidly.Simulation results demonstrate that compared to traditional rein-forcement learning model based interference decision algorithms,higher efficiency and effectiveness in decision-making can be attained via the proposed algorithms and the MFR interference decision can be efficiently and robustly achieved.

interference decisionMFRPPOLSTM networknetwork prior knowledge

张静克、杨凯、李超、王洪雁

展开 >

电子信息系统复杂电磁环境效应国家重点实验室,河南 洛阳 471032

浙江理工大学信息科学与工程学院,浙江 杭州 310018

浙江理工大学计算机科学与技术学院,浙江 杭州 310018

干扰决策 多功能雷达 近端策略优化 长短期记忆网络 先验知识

2024

通信学报
中国通信学会

通信学报

CSTPCD北大核心
影响因子:1.265
ISSN:1000-436X
年,卷(期):2024.45(12)