首页|基于Transformer-CNN混合架构的跨模态融合抓取检测

基于Transformer-CNN混合架构的跨模态融合抓取检测

扫码查看
在机器臂抓取检测领域,RGB图像和深度图像的处理效率仍有很大提升空间。鉴于此,提出一种基于Transformer-CNN混合架构的新型跨模态交互融合的机械臂抓取检测方法。为了充分利用RGB和深度图像的特征信息,开发一种高效的跨模态特征交互融合模块,用来校准RGB和深度图像相对应的特征信息,并交互增强双模态的特征。此外,设计一种Transformer与CNN并行的网络模块,结合CNN的局部建模能力和Transformer的全局建模能力,获得更好的特征表示,从而提高抓取检测性能。实验结果表明,所提方法在Cornell与Jacquard抓取数据集上分别达到了 99。1%和96。2%的准确率。在真实场景下的抓取检测实验验证了所提方法可以有效预测各种场景下物品的抓取位置。
Cross-modal interaction fusion grasping detection based on Transformer-CNN hybrid architecture
In the field of robotic grasping detection,there is still great room for improvement in the processing efficiency of RGB and depth images.This article proposes a novel RGB-D cross modal interactive fusion method for robotic grasping detection based on a Transformer-CNN hybrid architecture.In order to fully utilize the feature information of RGB and depth images,an efficient cross modal feature interaction fusion module has been developed,which can calibrate the corresponding feature information of RGB and depth images and interactively enhance the bimodal features.In addition,a parallel network module between Transformer and CNN is designed to combine the local modeling ability of CNN and the global modeling ability of Transformer to obtain better feature representation and improve the performance of grab detection.The experimental results show that this method achieves an accuracy of 99.1%and 96.2%on the Cornell dataset and Jacquard dataset,respectively.The grasp detection experiments in real scenes verify that the proposed method can effectively predict the grasp pose of objects in various scenarios.

robotic grasping detectioncross-modalRGB-D fusionTransformerCNN

王勇、李邑灵、苗夺谦、安春艳、袁鑫林

展开 >

重庆理工大学两江人工智能学院,重庆 401135

同济大学电子与信息工程学院,上海 200092

机械臂抓取检测 跨模态 RGB-D融合 Transformer CNN

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(11)