中国科学:技术科学(英文版)2024,Issue(7) :2176-2190.DOI:10.1007/s11431-023-2552-x

Image attention transformer network for indoor 3D object detection

REN KeYan YAN Tong HU ZhaoXin HAN HongGui ZHANG YunLu
中国科学:技术科学(英文版)2024,Issue(7) :2176-2190.DOI:10.1007/s11431-023-2552-x

Image attention transformer network for indoor 3D object detection

REN KeYan 1YAN Tong 1HU ZhaoXin 1HAN HongGui 1ZHANG YunLu1
扫码查看

作者信息

  • 1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • 折叠

Abstract

Point clouds and RGB images are both critical data for 3D object detection.While recent multi-modal methods combine them directly and show remarkable performances,they ignore the distinct forms of these two types of data.For mitigating the influence of this intrinsic difference on performance,we propose a novel but effective fusion model named LI-Attention model,which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore,based on the LI-Attention model,we propose a 3D object detection method called image attention transformer network(IAT-Net)specialized for indoor RGB-D scene.Compared with previous work on multi-modal detection,IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism,meanwhile generates and refines 3D detection results with transformer model.Extensive experiments demonstrate that our approach outperforms state-of-the-art performance on two widely used benchmarks of indoor 3D object detection,SUN RGB-D and NYU Depth V2,while ablation studies have been provided to analyze the effect of each module.And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.

Key words

3D object detection/transformer/attention mechanism

引用本文复制引用

基金项目

National Natural Science Foundation of China(61803004)

Aeronautical Science Foundation of China(20161375002)

出版年

2024
中国科学:技术科学(英文版)
中国科学院

中国科学:技术科学(英文版)

CSTPCDEI
影响因子:1.056
ISSN:1674-7321
段落导航相关论文