LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update
In fully cooperative tasks,the MADDPG algorithm has credit assignment and poor stability of training problem.To address this problem,a LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update was proposed.According to the idea of Difference Reward and Value Decomposition,LSTM was used to extract the characteristics between trajectory sequences.The global reward division was optimized to realize the agent's reward distribution.In order to meet requirements of algorithm joint training,the high-quality training set was constructed.Then,the asynchronous cooperative update method was designed to joint train the LSTM-MADDPG network,and realize the cooperation of multi-agent.In cooperative capture scene,the simulation results show that the convergence speed of the proposed algorithm is increased by 20.51%compared with the QMIX.After the convergence of algorithm training,the update method of asynchronous cooperation reduces the mean square error of normalized reward value by 57.59%compared with synchronous update,which improves the stability of algorithm convergence.