多机器人在网格环境约束下的运动策略

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对多智能体在网格环境下的寻路与避障规划问题,提出一种分布式、基于深度强化学习的多机器人避障导航方法.该方法基于最近策略优化算法(PPO)用于离散决策下的改进方法进行训练得到的策略模型,该模型通过每个智能体自身的前序多帧仿真激光雷达距离信息,生成符合预设规范的动作,实现多机器人系统在不同环境中的寻路避障.该模型在训练过程中通过引入密度奖励、距离奖励以及步长惩罚,提高了智能体在场景当中的避障寻路能力,减轻了拥塞、死锁等问题的发生,减少了无效路径生成.实验部分在仿真环境中对模型在随机场景、复杂交互场景、障碍场景多个场景进行实验,证明了该模型相比于集中式规划方法大大降低了规划时间,提高了泛化性和稳定性.通过与其他分布式方法相比,证明了所提到的密度、距离奖励设置对智能体安全快速完成任务具有良好作用,在规划效果上减小了与集中式规划方式的差距.

外文标题：Movement strategy of multiple robots under grid environment constraints

外文摘要：Aiming at the problem of multi-agent pathfinding and obstacle avoidance planning in grid environment,a distributed and deep reinforcement learning-based multi-robot obstacle avoidance navigation method was proposed.Based on training the Proximal Policy Optimization(PPO)algorithm used for the improved method under discrete decision-making,a policy model was obtained,which generated actions that conformed to the preset specifications through multi-frame lidar distance informa-tion of each agent.It could realize the pathfinding and obstacle avoidance of the multi-robot system in different environments.By introducing density reward,distance reward and step size penalty in the training process,the model improved the ability of the agent to avoid obstacles and find paths in the scene,lightened the occurrence of congestion,deadlock and other problems,and reduced the generation of invalid paths.In the experiment part,the model was tested in random scenes,complex interac-tion scenes,and obstacle scenes in the simulation environment,and it was proved that the model greatly reduced the planning time and improved the generalization and stability compared with the centralized planning method.Compared with other distrib-uted methods,the proposed density and distance reward settings had a good effect on the agent to complete the task safely and quickly,and reduced the gap with the centralized planning method in the planning effect.

外文关键词：

multi-robot systemdeep reinforcement leaninggrid workspacepath finding and obstacle avoidance

作者：

李硕、赵永廷、何盼、高鹏、王小军、赵立军、郑彬

展开 >

作者单位：

中国科学院重庆绿色智能技术研究院,重庆 400722

重庆邮电大学计算机科学与技术学院,重庆 400065

中科万勋智能科技(苏州)有限公司,江苏苏州 215153

重庆文理学院智能制造工程学院,重庆 402160

人工智能与服务机器人控制技术重庆市重点实验室,重庆 400722

展开 >

关键词：

多智能体深度强化学习网格工作空间寻路避撞

基金：

重庆市自然科学基金面上资助项目重庆市技术创新与应用示范专项重点示范资助项目重庆市技术创新与应用发展专项重点资助项目重庆市技术创新与应用发展专项重点资助项目

项目编号：

cstc2019jcyjmsxmX0442cstc2018jszxcyzdX0068cstc2021jscxgksbX0003cstc2021jscxgksbX0020

出版年：

2024

DOI：

10.13196/j.cims.2022.0023

计算机集成制造系统

中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心

影响因子：1.092

ISSN：1006-5911

年,卷(期)：2024.30(9)