针对多机器人协同编队任务中过度中心化、系统鲁棒性低、编队稳定性较差等问题,提出了基于投影奖励机制的多机器人协同编队与避障(projected reward for multi-robot formation and obstacle avoidance,PRMFO)模型,实现了多机器人基于统一状态表征方法的去中心化决策过程.设计了一种多机器人统一状态表征方法,实现了机器人与外界环境交互信息处理的一致性;基于统一状态表征设计了基于投影的奖励机制,从距离和方向两个维度将奖励过程矢量化,丰富机器人的决策依据;为了解决多机器人系统中过度中心化问题,设置了自主决策层,融合统一状态表征与投影奖励机制的软演员评论家(soft actor-critic,SAC)算法,实现了多机器人协同编队与避障任务.在机器人操作系统(robot operating system,ROS)环境下进行仿真实验,实验数据表明PRMFO模型在单机器人平均回报值、成功率以及时间等指标上分别提高42%、8%、9%,基于PRMFO模型的多机器人编队误差控制在0~0.06范围内,实现了较高精度的多机器人编队.
Abstract
To address issues of excessive centralization,low system robustness,and forma-tion instability in multi-robot formation tasks,this paper introduces the projected reward for multi-robot formation and obstacle avoidance(PRMFO)approach.PRMFO achieves decentralized decision-making for multi-robot using a unified state representation method,ensuring consistency in processing information regarding interactions between robots and the external environment.The projected reward mechanism,based on this unified state representation,enhances the decision-making foundation by vectorizing rewards in both distance and direction dimensions.To mitigate excessive centralization,an autonomous decision layer is established by integrating the soft actor-critic(SAC)algorithm with uni-form state representation and the projected reward mechanism.Simulation results in the robot operating system(ROS)environment demonstrate that PRMFO enhances average re-turn,success rate,and time metrics by 42%,8%,and 9%,respectively.Moreover,PRMFO keeps the multi-robot formation error within the range of 0 to 0.06,achieving a high level of accuracy.
关键词
深度强化学习/多机器人协同/编队与避障/投影奖励
Key words
deep reinforcement learning/cooperative multi-robot/formation and obstacle avoidance/projected reward