Uncertainty-based credit assignment for cooperative multi-agent reinforcement learning
In recent years,multi-agent cooperation under partially observable conditions has attracted extensive attention.As a general paradigm to deal with such tasks,centralized training with decentralized execution faces the core problem of credit assignment.Value decomposition is a representative method within this paradigm.Through the mixing network,the joint state action-value function is decomposed into multiple local observation action-value functions to realize credit assignment,which performs well in many problems.However,the single point estimation of the mixing network parameters maintained by these methods lacks the representation of uncertainty and is thus difficult to effectively deal with the random factors in the environment,resulting in convergence to the suboptimal strategy.To alleviate this problem,this paper performs Bayesian analysis on the mixing network and proposes a method based on uncertainty for multi-agent credit assignment,which guides the credit assignment by explicitly quantifying the uncertainty of parameters.Considering the complex interactions among agents,this paper utilizes the Bayesian hypernetwork to implicitly model the arbitrary complex posterior distribution of the mixing network parameters,to avoid falling into the local optima by specifying the distribution type a priori.This paper compares and analyzes the performance of representative algorithms on multiple maps in StarCraft multi-agent challenge(SMAC)and verifies the effectiveness of the proposed algorithm.