兵器装备工程学报2024,Vol.45Issue(7) :1-10.DOI:10.11809/bqzbgcxb2024.07.001

基于强化学习的多智能体协同电子对抗方法

Multi-agent cooperative electronic countermeasure method based on reinforcement learning

杨洋 王烨 康大勇 陈嘉玉 李姜 赵华栋
兵器装备工程学报2024,Vol.45Issue(7) :1-10.DOI:10.11809/bqzbgcxb2024.07.001

基于强化学习的多智能体协同电子对抗方法

Multi-agent cooperative electronic countermeasure method based on reinforcement learning

杨洋 1王烨 1康大勇 2陈嘉玉 3李姜 1赵华栋1
扫码查看

作者信息

  • 1. 中国科学院长春光学精密机械与物理研究所,长春 130033;中国科学院大学,北京 100049
  • 2. 光电对抗测试评估技术重点实验室,河南 洛阳 471000
  • 3. 中国科学院长春光学精密机械与物理研究所,长春 130033
  • 折叠

摘要

传统电子战正逐步向融合人工智能技术的智能电子战演变,基于强化学习的多无人机电子协同对抗为主要场景,针对复杂高维的状态动作空间下多智能体强化学习算法不容易收敛问题,提出了一种基于优先经验回放的多智能体双对抗策略梯度算法.该算法通过引入优先经验回放机制,并提出对抗Critic网络和双Critic网络来平衡动作及价值间的关系和减小单一Critic网络估计不确定性的问题.仿真实验结果表明:在同一仿真场景下相较于其他强化学习算法,PerMaD4 算法具有更好的收敛效果且任务完成度提高了8.9%.

Abstract

Traditional electronic warfare is gradually evolving into intelligent electronic warfare that integrates artificial intelligence technology.In view of the problem that multi-agent reinforcement learning algorithm is not easy to converge in complex and high-dimensional state action space,a multi-agent dual adversarial strategy gradient algorithm based on preferential experience playback is proposed.The algorithm introduces a preferential experience playback mechanism,and presents a counter Critic network and a dual Critic network to balance the relationship between action and value and to reduce the uncertainty of a single Critic network.The simulation results show that compared with other reinforcement learning algorithms,the PerMaD4 algorithm has better convergence effect and the task completion degree is increased by 8.9%in the same simulation scene.

关键词

协同决策/强化学习/策略梯度/电子对抗仿真

Key words

collaborative decision-making/reinforcement learning/policy gradient/electronic countermeasure simulation

引用本文复制引用

基金项目

国家自然科学基金项目(61977059)

出版年

2024
兵器装备工程学报
重庆市(四川省)兵工学会 重庆理工大学

兵器装备工程学报

CSTPCDCSCD北大核心
影响因子:0.478
ISSN:2096-2304
参考文献量12
段落导航相关论文