Deep learning based multi-view dense matching with joint depth and surface normal estimation
In recent years,deep learning-based multi-view stereo matching methods have demonstrated significant potential in 3D reconstruction tasks.However,they still exhibit limitations in recovering fine geometric details of scenes.In some tradi-tional multi-view stereo matching methods,surface normal often serves as a crucial geometric constraint to assist in finer depth inference.Nevertheless,the surface normal information,which encapsulates the geometric information of the scene,has not been fully utilized in modern learning-based methods.This paper introduces a deep learning-based joint depth and surface normal estimation method for multi-view dense matching and 3D scene reconstruction task.The proposed method employs a multi-stage pyramid structure to simultaneously infer depth and surface normal from multi-view images and promote their joint optimization.It consists of a feature extraction module,a normal-assisted depth estimation module,a depth-assisted normal estimation module,and a depth-normal joint optimization module.Specifically,the depth estimation module constructs a geometry-aware cost volume by integrating surface normal information for fine depth estimation.The normal estimation module utilizes depth constraints to build a local cost volume for inferring fine-grained normal maps.The joint optimization module further enhances the geometric consistency between depth and normal estimation.Experimental results on the WHU-OMVS dataset demonstrate that the proposed method performs exceptionally well in both depth and surface normal estima-tion,outperforming existing methods.Furthermore,the 3D reconstruction results on two different datasets indicate that the proposed method effectively recovers the geometric structures of both local high-curvature areas and global planar regions,contributing to well-structured and high-quality 3D scene models.