面向Ad-Hoc协作的局部观测重建方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：在多智能体强化学习的研究中,如何进行Ad-Hoc协作,也就是说如何适应种类和数量变化的队友,是一个关键问题.现有方法或者有很强的先验知识假设,或者使用硬编码的规则进行合作,缺乏通用性,无法泛化到更一般的Ad-Hoc协作场景.为解决该问题,提出一种面向Ad-Hoc协作的局部观测重建算法,利用注意力机制和采样网络对局部观测进行重建,使得算法认识到并充分利用不同局面中的高维状态表征,实现了在Ad-Hoc协作场景下的零样本泛化.在星际争霸微操环境和Ad-Hoc协作场景上与代表性算法的性能进行对比与分析,验证了算法的有效性.

外文标题：Local observation reconstruction for Ad-Hoc cooperation

外文摘要：In recent years,multi-agent reinforcement learning has received a lot of attention from researchers.In the study of multi-agent reinforcement learning,the question of how to perform ad-hoc cooperation,i.e.,how to adapt to a changing variety and number of teammates,is a key problem.Existing methods either have strong prior knowledge assumptions or use hard-coded protocols for cooperation,which lack generality and can not be generalized to more general ad-hoc cooperation scenarios.To address this problem,this paper proposes a local observation reconstruction algorithm for ad-hoc cooperation,which uses attention mechanisms and sampling networks to reconstruct local observations,enabling the algorithm to recognize and make full use of high-dimensional state representations in different situations and achieve zero-shot generalization in ad-hoc cooperation scenarios.In this paper,the performance of the algorithm is compared and analyzed with representative algorithms on the StarCraft micromanagement environment and ad-hoc cooperation scenarios to verify the effectiveness of the algorithm.

外文关键词：

multi-agentdeep reinforcement learningcredit assignmentAd-Hoc cooperation

作者：

陈皓、杨立昆、尹奇跃、黄凯奇

展开 >

作者单位：

中国科学院自动化研究所智能系统与工程研究中心,北京 100190

中国科学院大学人工智能学院,北京 100049

中国科学院脑科学与智能技术卓越创新中心,上海 200031

关键词：

多智能体深度强化学习信用分配 Ad-Hoc协作

基金：

国家自然科学基金北京市科技创新计划青年创新促进会-中国科学院项目

项目编号：

61876181Z19110000119043QYZDB-SSWJSC006

出版年：

2024

DOI：

10.7523/j.ucas.2022.028

中国科学院大学学报

中国科学院大学

中国科学院大学学报

CSTPCD北大核心

影响因子：0.614

ISSN：2095-6134

年,卷(期)：2024.41(1)

参考文献量44