首页|基于时空图注意力网络的服务机器人动态避障

基于时空图注意力网络的服务机器人动态避障

扫码查看
为了解决服务机器人在具有自主决策能力的密集人群中容易发生碰撞、假死和路径不自然等问题,在深度强化学习的框架下提出基于时空图注意力网络的服务机器人动态避障算法。时空图注意力网络作为邻近策略优化(PPO)算法的决策函数,首先采用门控循环单元控制机器人对环境的记忆和遗忘程度,提取环境的时间特征,使其对行人运动趋势有一定的预测作用;然后采用图注意力网络获取机器人和行人在空间上的隐式交互特征,使机器人能寻找无碰撞路径;最后在PPO算法中对时空图注意力网络进行训练,使得机器人在人群中完成无碰撞导航任务。在人均2。5 m2的动态封闭环境中对算法进行实验验证,结果表明,与非学习型的动态窗口算法相比,该算法导航成功率提高71个百分点,与基于学习型的DSRNN-RL算法相比,该算法导航成功率提高3个百分点同时导航路径更短。Gazebo环境下的实时导航测试结果表明,所提算法的平均推理时间为21。90 ms,可以满足实时导航的要求。
Dynamic Obstacle Avoidance for Service Robots Based on Spatio-Temporal Graph Attention Network
To solve the problems of collision,freezing,and the unnatural paths of service robots in dense crowds with autonomous decision-making ability,this study proposes a dynamic obstacle avoidance algorithm for service robots based on spatio-temporal graph attention network under the framework of Deep Reinforcement Learning(DRL).Spatio-temporal graph attention network represents the decision function of Proximal Policy Optimization(PPO)algorithm.First,the algorithm uses a Gated Recurrent Unit(GRU)to control the degree of memory and forgetting of the robot with respect to its environment and then extracts the time characteristics of that environment.This ensures the robot has a certain predictive effect on the movement trend of pedestrians.Second,the algorithm uses graph attention networks to obtain the spatially implicit interaction features between robots and pedestrians,enabling the robot to locate collision-free paths.Finally,the spatio-temporal graph attention network is trained under the PPO algorithm,which enables the robot to realize collision-free navigation tasks in a crowd.The algorithm is verified by simulation experiments in a dynamic closed environment of 2.5 m2 per capita.Compared with the non-learning Dynamic Window Algorithm(DWA),the navigation success rate of the proposed algorithm is improved by 71 percentage points.In addition,compared with the learning-type DSRNN-RL algorithm,the navigation success rate of the proposed algorithm is improved by 3 percentage points and the navigation path is shorter.Finally,a real-time navigation test in the Gazebo environment shows that the average inference time of the algorithm is 21.90 ms,which meets the requirements of real-time navigation.

service robotdynamic obstacle avoidanceDeep Reinforcement Learning(DRL)spatio-temporal graph attention networkreal-time navigation

杜海军、余粟

展开 >

上海工程技术大学电子电气工程学院,上海 201620

服务机器人 动态避障 深度强化学习 时空图注意力网络 实时导航

上海市科委科研计划项目

17511110204

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(2)
  • 6