Monocular Image Depth Estimation Based on Fully Convolutional Network with a Symmetric Encoder-Decoder Architecture
Monocular image depth estimation perceives the spatial position relationship of each pixel through the image from a unique viewing angle,which is of great significance for scene understanding and 3D reconstruction.In order to comprehensively im-prove the amount of information covered by the prediction depth map and keep the key details from being lost,this paper designs a fully convolutional network based on the symmetric codec structure to perform the depth estimation task,which is called ResUNet.The network inherits the classical architecture of the U-Net model,firstly the improved ResNet network is used to realize the feature encoding,and secondly,the decoder design of the U-Net model is retained to decode the feature map into a depth map,which inte-grates the characteristics of ResNet and U-Net network,and maximizes their respective advantages through collaborative optimiza-tion,which can realize the maximum retention of spatial structure and detail information in the process of depth estimation,and then improve the authenticity and reliability of the prediction depth map.Based on the network,the ResDepth algorithm is further proposed,which is designed from the perspective of loss function to comprehensively improve the quality of the predicted depth map without bringing additional computational overhead.Finally,comparative experiments are carried out on three public datasets,which are NYU-Depth V2,SUN RGB-D and KITTI to evaluate the performance of the algorithm,and the experiments show that the ResDepth algorithm and the joint loss function proposed in this paper can better retain the spatial structure information and geo-metric detail information,and then improve the accuracy of the depth estimation results.
monocular depth estimationfully convolutional networkdilated convolutionjoint loss function