图学学报2024,Vol.45Issue(3) :454-463.DOI:10.11996/JG.j.2095-302X.2024030454

结合金字塔结构和注意力机制的单目深度估计

Monocular depth estimation combining pyramid structure and attention mechanism

李滔 胡婷 武丹丹
图学学报2024,Vol.45Issue(3) :454-463.DOI:10.11996/JG.j.2095-302X.2024030454

结合金字塔结构和注意力机制的单目深度估计

Monocular depth estimation combining pyramid structure and attention mechanism

李滔 1胡婷 1武丹丹1
扫码查看

作者信息

  • 1. 西华大学电气与电子信息学院,四川 成都 610039
  • 折叠

摘要

单目深度估计是由单幅彩色图像预测出一幅稠密的深度图像.针对目前单目深度估计算法存在边界模糊、上下文信息捕捉能力不足等问题,提出了一种结合金字塔结构和注意力机制的单目深度估计算法.算法采用编码器-解码器的总体框架,其中编码器选用PVTv2 网络,目的是利用Transformer网络在建模全局信息方面的优势以获取更充分的全局语义信息;解码器由深度估计主分支和 2 个金字塔子分支组成.深度估计主分支通过空间和通道注意力机制来自适应地关注编码器和解码器特征间重要的特征区域和特征通道;拉普拉斯金字塔子分支和深度残差金字塔子分支旨在从彩色图像和深度估计主分支深度特征中学习到丰富的局部信息并传递到深度估计主分支,进一步解决单目深度估计中细节缺失、结构混乱等问题.实验结果表明,与先进的算法P3Depth相比,在室内公开数据集NYU Depth V2 上,该算法的δ1.25 阈值精度提升了 1.22%,绝对误差和根均方误差分别降低了 5.8%和 2.8%;而在室外公开数据集KITTI上,该算法的绝对误差、根均方对数误差和根均方误差分别降低了 8.5%,3.9%和 0.4%.该算法提升了深度估计精度并得到了良好的视觉呈现效果.

Abstract

Monocular depth estimation is the prediction of a dense depth image from a single color image.A monocular depth estimation algorithm combining pyramid structure and attention mechanism was proposed to address the issues of boundary ambiguity and insufficient capture of contextual information in current monocular depth estimation algorithms.The algorithm adopted the overall framework of encoder-decoder,in which the encoder selected the PVTv2 network to obtain more adequate global semantic information by taking advantage of the Transformer network in modeling global information.The decoder consisted of a depth estimation main branch and two pyramid sub-branches.The depth estimation main branch adaptively focused on important feature regions and feature channels between the encoder and decoder features through spatial and channel attention mechanisms.The Laplacian pyramid sub-branch and depth residual pyramid sub-branch aimed to learn rich local information from color images and depth estimation main branch depth features,transferring it to the depth estimation main branch to address the problems of missing details and chaotic structures in monocular depth estimation.Experimental results demonstrated that on the indoor public dataset NYU Depth V2,compared with the advanced algorithm P3Depth,the accuracy of δ1.25 threshold was increased by 1.22%,the absolute error and root mean square error were decreased by 5.8%and 2.8%,respectively.On the outdoor public dataset KITTI,the absolute error,root mean square logarithmic error,and root mean square error of the algorithm were decreased by 8.5%,3.9%,and 0.4%,respectively.The algorithm improved the accuracy of depth estimation and achieved a good visual rendering.

关键词

深度学习/单目深度估计/金字塔结构/注意力机制/Transformer

Key words

deep learning/monocular depth estimation/pyramid structure/attention mechanism/Transformer

引用本文复制引用

基金项目

四川省科技计划(2021YJ0109)

国家自然科学基金(61901392)

国家自然科学基金(62041109)

出版年

2024
图学学报
中国图学学会

图学学报

CSTPCDCSCD北大核心
影响因子:0.73
ISSN:2095-302X
参考文献量37
段落导航相关论文