面向博弈对抗的多智能体强化学习建模与迁移技术

A Multi-agent Reinforcement Learning Modeling and Transfer Technique for Confrontation Game

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：多智能体博弈对抗问题涉及智能体之间的协作配合,传统基于博弈论等方法的解决方案并不适用于复杂场景的博弈对抗问题.基于强化学习的多智能体协同训练机制是近年来的研究热点.针对中国电子科技集团发布的多智能体博弈对抗问题,设计基于值分解的多智能体深度强化学习方法,为每个智能体独立构建网络模型,通过引入混合网络连接各个智能体,训练时以混合网络指导各智能体网络更新,执行时各智能体网络独立运行,实现集中式学习、分散式执行的训练模式.针对同构异构场景,设计一种高效的迁移训练技术,提升多智能体强化学习方法在不同场景下的快速训练效率.对同构和异构博弈对抗问题分别进行测试,实验结果表明基于值分解的多智能体强化学习方法和迁移技术,能够有效提升智能体的协作行为以及训练效率.

外文摘要：Multi-agent game confrontation problems involve cooperations among agents.The traditional solutions based on game theory are not suitable to game confrontation problems in the complex scenarios.The multi-agent cooperative training mechanism is a research hotspot in recent years.For the multi-agent game confrontation problems published by China Electronics Technology Group Corporation,a deep multi-agent deep reinforcement learning method is designed based on the value decomposition.A network model for each agent is built indepen-dently.Each agent is connected by introducing the hybrid network.During training,the hybrid network is used to guide the network update of each agent.During the execution,each agent network runs independently to realize the training mode of centralized learning and decentralized execution.As for the isomorphic and heterogeneous scenarios,an efficient transfer training method is designed to improve the efficiency of fast training of multi-agent reinforcement learning method in different scenarios.Lastly,the experiments on the isomorphic and heterogeneous game confrontation problems are carried out respectively.The experiment results show that multi-agent reinforcement learning method and the transfer technique based on value decomposition can effectively improve the cooperative behaviors and the training efficiency of agents.

外文关键词：

multi-agent confrontation gamedeep reinforcement learningtransfer learningvalue decompositionhybrid networktraining efficiency

作者：

李渊、刘运韬、徐新海、万珂嘉

展开 >

作者单位：

军事科学院,北京 100190

关键词：

多智能体对抗博弈深度强化学习迁移学习值分解混合网络训练效率

基金：

国家自然科学基金

项目编号：

61902425

出版年：

2024

DOI：

10.3969/j.issn.2096-0204.2024.02.0226

指挥与控制学报

CSTPCD北大核心

ISSN：

年,卷(期)：2024.10(2)

参考文献量5