What Have the Single Image Based Depth Prediction Models Learnt
Recently,single-image based depth learning via deep learning has achieved tremendous progress,and impressive prediction accuracy has been reported on both indoor and outdoor benchmark datasets.However,what have been learnt by such models from single images under either supervised or self-supervised learning framework?It seems this fundamental problem is rarely discussed in the literature up to now.In this work,this problem is investigated from the following two aspects:at first,for those tex-ture-poor or no texture regions,it is tested whether the corresponding predicted depths are somewhat fill-ing-in effect of the depths in their close neighborhood region.Secondly,it is assessed whether the regions with high visual saliency usually have better depth prediction performance.Our test results show that the predicted depths in texture-poor regions indeed have high correlation with the depths in their close neighborhood region.However,the accuracy of the estimated depth is not particularly related to the visual saliency of the input image.The above results could be of reference value for both model analysis and model design,for example,visual saliency of input image could be taken into account in the model design and training to enhance the prediction accuracy of high saliency regions,so as to better serve the down-stream vision tasks.