首页|面向RGB-D语义分割的多模态任意旋转自监督学习

面向RGB-D语义分割的多模态任意旋转自监督学习

扫码查看
基于RGB-D数据的自监督学习受到广泛关注,然而大多数方法侧重全局级别的表示学习,会丢失对识别对象至关重要的局部细节信息.由于RGB-D数据中图像和深度具有几何一致性,因此这可以作为线索来指导RGB-D数据的自监督特征表示学习.在本文中,我们提出了 ArbRot,它可以无限制地旋转角度并为代理任务生成多个伪标签用于自监督学习,而且还建立了全局和局部之间的上下文联系.本文所提出的ArbRot可以与其他对比学习方法联合训练,构建多模态多代理任务自监督学习框架,以增强图像和深度视图的特征表示一致性,从而为RGB-D语义分割任务提供有效的初始化.在SUN RGB-D和NYU Depth Dataset V2数据集上的实验结果表明,多模态任意旋转自监督学习得到的特征表示质量均高于基线模型.开源代码:https://github.com/Physu/ArbRot.
Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation
Self-supervised learning on RGB-D datasets has attracted extensive attention.However,most methods focus on global-level representation learning,which tends to lose local details that are crucial for recognizing the objects.The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data.In this study,ArbRot is proposed,which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks,but also establish the relationship between global and local context.The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal,multiple pretext task self-supervised learning framework,so as to enforce feature consistency within image and depth views,thereby providing an effective initialization for RGB-D semantic segmentation.The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal,arbitrary-orientation rotation self-supervised learning is better than the baseline models.

self-supervised learningpretext taskcontrastive learningRGB-Dmulti-modal

李鸿宇、张宜飞、杨东宝

展开 >

中国科学院信息工程研究所,北京 100085

中国科学院大学网络空间安全学院,北京 100049

自监督学习 代理任务 对比学习 RGB-D 多模态

国家自然科学基金面上项目中国科学院基础前沿科学研究计划从0到1原始创新项目

62376266ZDBS-LY-7024

2024

计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
年,卷(期):2024.33(1)
  • 1