稀疏奖励场景下基于状态空间探索的多智能体强化学习算法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列,会导致学习过程缓慢和低效.为了解决这一问题,文中提出基于状态空间探索的多智能体强化学习算法,构建状态子集空间,从中映射出一个状态,并将其作为内在目标,使智能体更充分利用状态空间并减少不必要的探索.将智能体状态分解成自身状态与环境状态,结合这两类状态与内在目标,生成基于互信息的内在奖励.构建状态子集空间和基于互信息的内在奖励,对接近目标状态的状态与理解环境的状态给予适当的奖励,以激励智能体更积极地朝着目标前进,同时增强对环境的理解,从而引导其灵活适应稀疏奖励场景.在稀疏程度不同的多智能体协作场景中的实验验证文中算法性能较优.

外文标题：Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios

外文摘要：In multi-agent task scenarios,a large and diverse state space is often encountered.In some cases,the reward information provided by the external environment may be extremely limited,exhibiting sparse reward characteristics.Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios,as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process.To address this issue,a multi-agent reinforcement learning algorithm based on state space exploration(MASSE)in sparse reward scenarios is proposed.MASSE constructs a subset space of states,maps one state from this subset,and takes it as an intrinsic goal,enabling agents to more fully utilize the state space and reduce unnecessary exploration.The agent states are decomposed into self-states and environmental states,and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals.By constructing a state subset space and generating intrinsic rewards based on mutual information,the states close to the target states and the states understanding the environment are rewarded appropriately.Consequently,agents are motivated to move more actively towards the goal while enhancing their understanding of the environment,guiding them to flexibly adapt to sparse reward scenarios.The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.

外文关键词：

Reinforcement LearningSparse RewardMutual InformationIntrinsic Rewards

作者：

方宝富、余婷婷、王浩、王在俊

展开 >

作者单位：

合肥工业大学计算机与信息学院合肥 230601

合肥工业大学情感计算与先进智能机器安徽省重点实验室合肥 230601

中国民用航空飞行学院民航飞行技术与飞行安全重点实验室广汉 618307

关键词：

强化学习稀疏奖励互信息内在奖励

基金：

国家自然科学基金项目安徽省自然科学基金项目安徽高校协同创新项目民航飞行技术与飞行安全重点实验室开放基金项目

项目编号：

618723272308085MF203GXXT-2022-055FZ2022KF09

出版年：

2024

DOI：

10.16451/j.cnki.issn1003-6059.202405005

模式识别与人工智能

中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心

影响因子：0.954

ISSN：1003-6059

年,卷(期)：2024.37(5)

参考文献量2