首页|稀疏奖励场景下基于状态空间探索的多智能体强化学习算法

稀疏奖励场景下基于状态空间探索的多智能体强化学习算法

扫码查看
多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列,会导致学习过程缓慢和低效.为了解决这一问题,文中提出基于状态空间探索的多智能体强化学习算法,构建状态子集空间,从中映射出一个状态,并将其作为内在目标,使智能体更充分利用状态空间并减少不必要的探索.将智能体状态分解成自身状态与环境状态,结合这两类状态与内在目标,生成基于互信息的内在奖励.构建状态子集空间和基于互信息的内在奖励,对接近目标状态的状态与理解环境的状态给予适当的奖励,以激励智能体更积极地朝着目标前进,同时增强对环境的理解,从而引导其灵活适应稀疏奖励场景.在稀疏程度不同的多智能体协作场景中的实验验证文中算法性能较优.
Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
In multi-agent task scenarios,a large and diverse state space is often encountered.In some cases,the reward information provided by the external environment may be extremely limited,exhibiting sparse reward characteristics.Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios,as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process.To address this issue,a multi-agent reinforcement learning algorithm based on state space exploration(MASSE)in sparse reward scenarios is proposed.MASSE constructs a subset space of states,maps one state from this subset,and takes it as an intrinsic goal,enabling agents to more fully utilize the state space and reduce unnecessary exploration.The agent states are decomposed into self-states and environmental states,and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals.By constructing a state subset space and generating intrinsic rewards based on mutual information,the states close to the target states and the states understanding the environment are rewarded appropriately.Consequently,agents are motivated to move more actively towards the goal while enhancing their understanding of the environment,guiding them to flexibly adapt to sparse reward scenarios.The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.

Reinforcement LearningSparse RewardMutual InformationIntrinsic Rewards

方宝富、余婷婷、王浩、王在俊

展开 >

合肥工业大学 计算机与信息学院 合肥 230601

合肥工业大学 情感计算与先进智能机器安徽省重点实验室 合肥 230601

中国民用航空飞行学院 民航飞行技术与飞行安全重点实验室 广汉 618307

强化学习 稀疏奖励 互信息 内在奖励

国家自然科学基金项目安徽省自然科学基金项目安徽高校协同创新项目民航飞行技术与飞行安全重点实验室开放基金项目

618723272308085MF203GXXT-2022-055FZ2022KF09

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(5)
  • 2