针对高动态通信对抗场景下无人机集群协同干扰资源分配问题,提出一种结合状态正态化、优势标准化、熵正则化机制和近端策略优化算法(state normalization,advantage normalization and entropy regularization-based proximal policy optimization,SANER-PPO)的干扰资源分配方法。首先,以无人机集群有效干扰的目标电台数量最大化和消耗的干扰功率最小化为目标函数,建立干扰资源分配优化问题;然后,将无人机集群映射为智能体,根据干扰资源分配模型建立马尔科夫决策过程;最后,利用SANER-PPO算法求解资源分配优化问题,生成无人机集群的干扰波束和干扰功率的优化决策结果。相比于原始PPO算法,SANER-PPO算法将状态正态化机制引入智能体的决策阶段以增强算法的有效性,将优势标准化机制和熵正则化机制引入更新阶段来提升算法的收敛速度和稳定性。结果表明,所提出算法能有效解决协同干扰资源分配问题,相较于原始PPO和柔性演员评论家两种算法,在资源消耗量和有效干扰的成功率方面具有明显优势。进一步,通过逐步移除所提出算法的改进机制来进行消融实验,验证了3种改进机制的有效性。
SANER-PPO algorithm-based jamming resource allocation for UAV swarm
This paper proposes an approach of jamming resource allocation based on an enhanced proximal policy optimization(PPO)algorithm to handle the jamming resource allocation problem of UAV swarms in the scenario of high-dynamic communication countermeasure.The enhanced PPO algorithm combines state normalization,advantage normalization,and entropy regularization mechanisms with the PPO algorithm,which is referred to as the SANER-PPO algorithm in this paper.Firstly,we aim at maximizing the number of target radios which are jammed by a UAV swarm successfully,while minimizing the sum of jamming power consumption of the UAV swarm.Then,the UAV swarm is modeled as agents,and a Markov decision process is established based on the jamming resource allocation model.Finally,an SANER-PPO algorithm is proposed to obtain optimal decisions of jamming beamforming and power allocation.When compared to the original PPO algorithm,the SANER-PPO algorithm not only incorporates a state normalization mechanism into the decision stage of the agent to improve its effectiveness,but also introduces advantage normalization and entropy regularization mechanisms to the update stage to improve the convergence speed and stability of the algorithm.Numerical results demonstrate that the performance of the proposed algorithm outperforms the original PPO algorithm and the soft actor-critic algorithm in terms of successful interference rate and jamming power consumption.In addition,ablation experiments are conducted by gradually removing the three proposed mechanisms in the algorithm,which validate the effectiveness of these mechanisms.