首页|基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法

基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法

扫码查看
动态障碍物一直是阻碍智能体自主导航发展的关键因素,而躲避障碍物和清理障碍物是两种解决动态障碍物问题的有效方法.近年来,多智能体躲避动态障碍物(避障)问题受到了广大学者的关注,优秀的多智能体避障算法纷纷涌现.然而,多智能体清理动态障碍物(清障)问题却无人问津,相对应的多智能体清障算法更是屈指可数.为解决多智能体清障问题,文中提出了一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法(Multi-Agent Cooperative Algorithm for Obsta-cle Clearance Based on Deep Deterministic Policy Gradient and Attention Critic,MACOC).首先,创建了首个多智能体协同清障的环境模型,定义了多智能体及动态障碍物的运动学模型,并根据智能体和动态障碍物数量的不同,构建了4种仿真实验环境;其次,将多智能体协同清障过程定义为马尔可夫决策过程(Markov Decision Process,MDP),构建了多智能体t的状态空间、动作空间和奖励函数;最后,提出一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法,并在多智能体协同清障仿真环境中与经典的多智能体强化学习算法进行对比.实验证明,相比对比算法,所提出的MACOC算法清障的成功率更高、速度更快,对复杂环境的适应性更好.
Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic Policy Gradient and Attention Critic
Dynamic obstacles have always been a key factor hindering the development of autonomous navigation for agents.Ob-stacle avoidance and obstacle clearance are two effective methods to address the issue.In recent years,multi-agent obstacle avoi-dance(collision avoidance)has been an active research area,and there are numerous excellent multi-agent obstacle avoidance algo-rithms.However,the problem of multi-agent obstacle clearance remains relatively unknown,and the corresponding algorithms for multi-agent obstacle clearance are scarce.To address the issue of multi-agent obstacle clearance,a multi-agent cooperative algo-rithm for obstacle clearance based on deep deterministic policy gradient and attention Critic(MACOC)is proposed.Firstly,the first multi-agent cooperative environment model for obstacle clearance is created,and the kinematic models of the agents and dy-namic obstacles are defined.Four simulation environments are constructed based on different numbers of agents and dynamic ob-stacles.Secondly,the process of obstacle clearance cooperatively by multi-agent is defined as a Markov decision process(MDP)model.The state space,action space,and reward function for multi-agent are constructed.Finally,a multi-agent cooperative algo-rithm for obstacle clearance based on deep deterministic policy gradient and attention critic is proposed,and it is compared with classical multi-agent algorithms in the simulated environments for obstacle clearance.Experimental results show that,the pro-posed MACOC algorithm has a higher success rate in obstacle clearance,faster speed,and better adaptability to complex environ-ments compared to the compared algorithms.

Reinforcement learning algorithmMarkov decision processMulti-agent cooperative controlDynamic obstacle clea-ranceAttention mechanism

王宪伟、冯翔、虞慧群

展开 >

华东理工大学计算机科学与工程系 上海 200237

上海智慧能源工程技术研究中心 上海 200237

强化学习算法 马尔可夫决策过程 多智能体协同控制 动态障碍物清除 注意力机制

国家自然科学基金面上项目国家自然科学基金重点项目国家重点研发计划上海市经信委"信息化发展专项资金"上海市科技创新行动计划

62276097621360032020YFB1711700XX-XXFZ-02-20-246321002411000

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(7)