Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning
The loitering munition group penetration control decision ( LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic ( SAC ) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm,which verifies the effectiveness of the proposed algorithm.