清华大学学报自然科学版(英文版)2024,Vol.29Issue(1) :244-256.DOI:10.26599/TST.2023.9010003

Grasp Detection with Hierarchical Multi-Scale Feature Fusion and Inverted Shuffle Residual

Wenjie Geng Zhiqiang Cao Peiyu Guan Fengshui Jing Min Tan Junzhi Yu
清华大学学报自然科学版(英文版)2024,Vol.29Issue(1) :244-256.DOI:10.26599/TST.2023.9010003

Grasp Detection with Hierarchical Multi-Scale Feature Fusion and Inverted Shuffle Residual

Wenjie Geng 1Zhiqiang Cao 1Peiyu Guan 1Fengshui Jing 1Min Tan 1Junzhi Yu2
扫码查看

作者信息

  • 1. State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China
  • 2. Department of Advanced Manufacturing and Robotics,College of Engineering,Peking University.Beijing 100871,China
  • 折叠

Abstract

Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.

Key words

grasp detection/hierarchical multi-scale feature fusion/skip connections with attention/inverted shuffle residual

引用本文复制引用

基金项目

National Natural Science Foundation of China(62073322)

National Natural Science Foundation of China(61633020)

CIE-Tencent Robotics X Rhino-Bird Focused Research Program(2022-07)

Beijing Natural Science Foundation(2022MQ05)

出版年

2024
清华大学学报自然科学版(英文版)
清华大学

清华大学学报自然科学版(英文版)

CSTPCDEI
影响因子:0.474
ISSN:1007-0214
参考文献量44
段落导航相关论文