计算机系统应用2024,Vol.33Issue(1) :219-230.DOI:10.15888/j.cnki.csa.009362

面向RGB-D语义分割的多模态任意旋转自监督学习

Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation

李鸿宇 张宜飞 杨东宝
计算机系统应用2024,Vol.33Issue(1) :219-230.DOI:10.15888/j.cnki.csa.009362

面向RGB-D语义分割的多模态任意旋转自监督学习

Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation

李鸿宇 1张宜飞 1杨东宝1
扫码查看

作者信息

  • 1. 中国科学院信息工程研究所,北京 100085;中国科学院大学网络空间安全学院,北京 100049
  • 折叠

摘要

基于RGB-D数据的自监督学习受到广泛关注,然而大多数方法侧重全局级别的表示学习,会丢失对识别对象至关重要的局部细节信息.由于RGB-D数据中图像和深度具有几何一致性,因此这可以作为线索来指导RGB-D数据的自监督特征表示学习.在本文中,我们提出了 ArbRot,它可以无限制地旋转角度并为代理任务生成多个伪标签用于自监督学习,而且还建立了全局和局部之间的上下文联系.本文所提出的ArbRot可以与其他对比学习方法联合训练,构建多模态多代理任务自监督学习框架,以增强图像和深度视图的特征表示一致性,从而为RGB-D语义分割任务提供有效的初始化.在SUN RGB-D和NYU Depth Dataset V2数据集上的实验结果表明,多模态任意旋转自监督学习得到的特征表示质量均高于基线模型.开源代码:https://github.com/Physu/ArbRot.

Abstract

Self-supervised learning on RGB-D datasets has attracted extensive attention.However,most methods focus on global-level representation learning,which tends to lose local details that are crucial for recognizing the objects.The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data.In this study,ArbRot is proposed,which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks,but also establish the relationship between global and local context.The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal,multiple pretext task self-supervised learning framework,so as to enforce feature consistency within image and depth views,thereby providing an effective initialization for RGB-D semantic segmentation.The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal,arbitrary-orientation rotation self-supervised learning is better than the baseline models.

关键词

自监督学习/代理任务/对比学习/RGB-D/多模态

Key words

self-supervised learning/pretext task/contrastive learning/RGB-D/multi-modal

引用本文复制引用

基金项目

国家自然科学基金面上项目(62376266)

中国科学院基础前沿科学研究计划从0到1原始创新项目(ZDBS-LY-7024)

出版年

2024
计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
参考文献量1
段落导航相关论文