Improved MATD3 algorithm and its adversarial application
Improving the training effect of multi-agent has always been the focus in the field of reinforcement learning.Based on the multi-Agent twin-delay deep deterministic policy gradient(MATD3)algorithm,a parameter sharing mechanism is in-troduced to improve training efficiency.At the same time,in order to alleviate the inconsistency between real rewards and auxiliary rewards,drawing on the ideas of course learning,a decay factor for auxiliary rewards is proposed to ensure the mo-tivation of policy exploration in the early training period and the reward consistency in the late training period.And the pro-posed improved MATD3 algorithm is applied to combat vehicle games to achieve intelligent decision-making of the vehicle.The application results show that the reward curve of the vehicle converges stably and the effect is good.Besides,the im-proved algorithm is compared with the original MATD3 algorithm,and the simulation results verify that the improved algo-rithm can effectively improve the effect of convergence and the convergence value of reward.