Optimized Architecture for Cooperative Multi-agent Reinforcement Learning
Numerous real-world tasks require the collaboration of multiple agents,often with limited communication and incomplete observations.Deep multi-agent reinforcement learning(Deep-MARL)algorithms show remarkable effectiveness in tackling such challenging scenarios.Among these algorithms,QTRAN and QTRAN++are representative approaches capable of learning a broad class of joint-action value functions with strong theoretical guarantees.However,the performance of QTRAN and QTRAN++is hindered by their reliance on a single joint action-value estimator and their neglect of preprocessing agent observations.This study introduces a novel algorithm called OPTQTRAN,which significantly improves upon the performance of QTRAN and QTRAN++.Firstly,the study proposes a dual joint action-value estimator structure that leverages a decomposition network module to compute additional joint action-values.To ensure accurate computation of joint action-value estimators,it designs an adaptive network that facilitates efficient value function learning.Additionally,it introduces a multi-unit network that groups agent observations into different units for effective estimation of utility functions.Extensive experiments conducted on the widely-used StarCraft benchmark across diverse scenarios demonstrate that the proposed approach outperforms state-of-the-art MARL methods.