稀疏奖励场景下基于状态空间探索的多智能体强化学习算法
Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
方宝富 1余婷婷 1王浩 1王在俊2
作者信息
- 1. 合肥工业大学 计算机与信息学院 合肥 230601;合肥工业大学 情感计算与先进智能机器安徽省重点实验室 合肥 230601
- 2. 中国民用航空飞行学院 民航飞行技术与飞行安全重点实验室 广汉 618307
- 折叠
摘要
多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列,会导致学习过程缓慢和低效.为了解决这一问题,文中提出基于状态空间探索的多智能体强化学习算法,构建状态子集空间,从中映射出一个状态,并将其作为内在目标,使智能体更充分利用状态空间并减少不必要的探索.将智能体状态分解成自身状态与环境状态,结合这两类状态与内在目标,生成基于互信息的内在奖励.构建状态子集空间和基于互信息的内在奖励,对接近目标状态的状态与理解环境的状态给予适当的奖励,以激励智能体更积极地朝着目标前进,同时增强对环境的理解,从而引导其灵活适应稀疏奖励场景.在稀疏程度不同的多智能体协作场景中的实验验证文中算法性能较优.
Abstract
In multi-agent task scenarios,a large and diverse state space is often encountered.In some cases,the reward information provided by the external environment may be extremely limited,exhibiting sparse reward characteristics.Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios,as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process.To address this issue,a multi-agent reinforcement learning algorithm based on state space exploration(MASSE)in sparse reward scenarios is proposed.MASSE constructs a subset space of states,maps one state from this subset,and takes it as an intrinsic goal,enabling agents to more fully utilize the state space and reduce unnecessary exploration.The agent states are decomposed into self-states and environmental states,and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals.By constructing a state subset space and generating intrinsic rewards based on mutual information,the states close to the target states and the states understanding the environment are rewarded appropriately.Consequently,agents are motivated to move more actively towards the goal while enhancing their understanding of the environment,guiding them to flexibly adapt to sparse reward scenarios.The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.
关键词
强化学习/稀疏奖励/互信息/内在奖励Key words
Reinforcement Learning/Sparse Reward/Mutual Information/Intrinsic Rewards引用本文复制引用
基金项目
国家自然科学基金项目(61872327)
安徽省自然科学基金项目(2308085MF203)
安徽高校协同创新项目(GXXT-2022-055)
民航飞行技术与飞行安全重点实验室开放基金项目(FZ2022KF09)
出版年
2024