Monocular depth estimation based on adaptive fusion of multi-scale depth maps
Depth estimation networks usually have a large number of layers,which may lose substantial image information in the process of image feature encoding and decoding,so the predicted depth maps lack detailed object structures and the maps'edges are not clear.In this paper,a monocular depth estimation method was proposed based on the adaptive fusion of multi-scale depth maps and it can better preserve the object details and geometric contours.First,the squeeze-and-excitation residual network was employed and the attention mechanisms were utilized to encode the feature maps from different channels,and more details of the long-distance depth maps can be reserved.Second,a multi-scale feature fusion network was adopted to fuse the feature maps of different scales,which produced feature maps with rich geometric and semantic information.Third,a multi-scale adaptive depth fusion network was used to add learnable weights to the depth maps generated by feature maps of different scales.Finally,the depth maps of different scales can be adaptively fused and the object information in the predicted depth maps increases.Experiments on NYU Depth V2 dataset demonstrate that the absolute relative error is 0.115,the root mean square error is 0.525,and the accuracy can reach 99.3%.The depth map predictions of the proposed method have higher accuracy and rich object information.