CANet: Co-attention network for RGB-D semantic segmentation

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Incorporating the depth (D) information to RGB images has proven the effectiveness and robustness in semantic segmentation. However, the fusion between them is not trivial due to their inherent physical meaning discrepancy, in which RGB represents RGB information but D depth information. In this paper, we propose a co-attention network (CANet) to build sound interaction between RGB and depth features. The key part in the CANet is the co-attention fusion part. It includes three modules. Specifically, the po-sition and channel co-attention fusion modules adaptively fuse RGB and depth features in spatial and channel dimensions. An additional fusion co-attention module further integrates the outputs of the posi-tion and channel co-attention fusion modules to obtain a more representative feature which is used for the final semantic segmentation. Extensive experiments witness the effectiveness of the CANet in fus-ing RGB and depth features, achieving state-of-the-art performance on two challenging RGB-D semantic segmentation datasets, i.e., NYUDv2 and SUN-RGBD. (c) 2021 Elsevier Ltd. All rights reserved.

外文关键词：

RGB-DMulti -modal fusionCo-attentionSemantic segmentationFEATURES

作者：

Zhou, Hao、Qi, Lu、Huang, Hai、Yang, Xu、Wan, Zhaoliang、Wen, Xianglong

展开 >

作者单位：

Harbin Engn Univ

Chinese Univ Hong Kong

Chinese Acad Sci

Jihua Lab

展开 >

出版年：

2022

DOI：

10.1016/j.patcog.2021.108468

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.124

被引量32
参考文献量65