结合金字塔结构和注意力机制的单目深度估计

Monocular depth estimation combining pyramid structure and attention mechanism

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：单目深度估计是由单幅彩色图像预测出一幅稠密的深度图像.针对目前单目深度估计算法存在边界模糊、上下文信息捕捉能力不足等问题,提出了一种结合金字塔结构和注意力机制的单目深度估计算法.算法采用编码器-解码器的总体框架,其中编码器选用PVTv2 网络,目的是利用Transformer网络在建模全局信息方面的优势以获取更充分的全局语义信息;解码器由深度估计主分支和 2 个金字塔子分支组成.深度估计主分支通过空间和通道注意力机制来自适应地关注编码器和解码器特征间重要的特征区域和特征通道;拉普拉斯金字塔子分支和深度残差金字塔子分支旨在从彩色图像和深度估计主分支深度特征中学习到丰富的局部信息并传递到深度估计主分支,进一步解决单目深度估计中细节缺失、结构混乱等问题.实验结果表明,与先进的算法P3Depth相比,在室内公开数据集NYU Depth V2 上,该算法的δ1.25 阈值精度提升了 1.22%,绝对误差和根均方误差分别降低了 5.8%和 2.8%;而在室外公开数据集KITTI上,该算法的绝对误差、根均方对数误差和根均方误差分别降低了 8.5%,3.9%和 0.4%.该算法提升了深度估计精度并得到了良好的视觉呈现效果.

外文摘要：Monocular depth estimation is the prediction of a dense depth image from a single color image.A monocular depth estimation algorithm combining pyramid structure and attention mechanism was proposed to address the issues of boundary ambiguity and insufficient capture of contextual information in current monocular depth estimation algorithms.The algorithm adopted the overall framework of encoder-decoder,in which the encoder selected the PVTv2 network to obtain more adequate global semantic information by taking advantage of the Transformer network in modeling global information.The decoder consisted of a depth estimation main branch and two pyramid sub-branches.The depth estimation main branch adaptively focused on important feature regions and feature channels between the encoder and decoder features through spatial and channel attention mechanisms.The Laplacian pyramid sub-branch and depth residual pyramid sub-branch aimed to learn rich local information from color images and depth estimation main branch depth features,transferring it to the depth estimation main branch to address the problems of missing details and chaotic structures in monocular depth estimation.Experimental results demonstrated that on the indoor public dataset NYU Depth V2,compared with the advanced algorithm P3Depth,the accuracy of δ1.25 threshold was increased by 1.22%,the absolute error and root mean square error were decreased by 5.8%and 2.8%,respectively.On the outdoor public dataset KITTI,the absolute error,root mean square logarithmic error,and root mean square error of the algorithm were decreased by 8.5%,3.9%,and 0.4%,respectively.The algorithm improved the accuracy of depth estimation and achieved a good visual rendering.

外文关键词：

deep learningmonocular depth estimationpyramid structureattention mechanismTransformer

作者：

李滔、胡婷、武丹丹

展开 >

作者单位：

西华大学电气与电子信息学院,四川成都 610039

关键词：

深度学习单目深度估计金字塔结构注意力机制 Transformer

基金：

四川省科技计划国家自然科学基金国家自然科学基金

项目编号：

2021YJ01096190139262041109

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024030454

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(3)

参考文献量37