基于Transformer-CNN混合架构的跨模态融合抓取检测

扫码查看

原文链接

万方数据
维普

中文摘要：在机器臂抓取检测领域,RGB图像和深度图像的处理效率仍有很大提升空间.鉴于此,提出一种基于Transformer-CNN混合架构的新型跨模态交互融合的机械臂抓取检测方法.为了充分利用RGB和深度图像的特征信息,开发一种高效的跨模态特征交互融合模块,用来校准RGB和深度图像相对应的特征信息,并交互增强双模态的特征.此外,设计一种Transformer与CNN并行的网络模块,结合CNN的局部建模能力和Transformer的全局建模能力,获得更好的特征表示,从而提高抓取检测性能.实验结果表明,所提方法在Cornell与Jacquard抓取数据集上分别达到了 99.1％和96.2％的准确率.在真实场景下的抓取检测实验验证了所提方法可以有效预测各种场景下物品的抓取位置.

外文标题：Cross-modal interaction fusion grasping detection based on Transformer-CNN hybrid architecture

外文摘要：In the field of robotic grasping detection,there is still great room for improvement in the processing efficiency of RGB and depth images.This article proposes a novel RGB-D cross modal interactive fusion method for robotic grasping detection based on a Transformer-CNN hybrid architecture.In order to fully utilize the feature information of RGB and depth images,an efficient cross modal feature interaction fusion module has been developed,which can calibrate the corresponding feature information of RGB and depth images and interactively enhance the bimodal features.In addition,a parallel network module between Transformer and CNN is designed to combine the local modeling ability of CNN and the global modeling ability of Transformer to obtain better feature representation and improve the performance of grab detection.The experimental results show that this method achieves an accuracy of 99.1％and 96.2％on the Cornell dataset and Jacquard dataset,respectively.The grasp detection experiments in real scenes verify that the proposed method can effectively predict the grasp pose of objects in various scenarios.

外文关键词：

robotic grasping detectioncross-modalRGB-D fusionTransformerCNN

作者：

王勇、李邑灵、苗夺谦、安春艳、袁鑫林

展开 >

作者单位：

重庆理工大学两江人工智能学院,重庆 401135

同济大学电子与信息工程学院,上海 200092

关键词：

机械臂抓取检测跨模态 RGB-D融合 Transformer CNN

出版年：

2024

DOI：

10.13195/j.kzyjc.2023.1152

控制与决策

东北大学

控制与决策

CSTPCD北大核心

影响因子：1.227

ISSN：1001-0920

年,卷(期)：2024.39(11)