Unsupervised monocular image depth prediction in football sports scene based on improved featdepth
[Objective]To reduce the cost and improve the accuracy of the depth estimation model in the process of image depth information prediction and apply the depth estimation model to the complex football sports scene,an unsupervised monocular image depth prediction method based on the improved FeatDepth is proposed.The monocular depth estimation model is used to obtain the relative depth information between the people,the football,and the goal in the football scene and calculate the distance information between the targets,which can be used for football auxiliary training and monitoring whether the player is offside and other application scenarios.[Methods]First,the attention mechanism was introduced to the original FeatDepth method so that the model pays more attention to the effective feature information.Second,the PoseNet and DepthNet networks in FeatDepth were embedded in the GAM global attention mechanism module,adding additional context information to the network and improving the depth prediction performance of the FeatDepth model without increasing the computational cost.Third,because of the higher requirements for depth information prediction in the football scene,to ensure that the model exhibits better performance in the low-texture areas and details,the loss function scheme used by the original FeatDepth method was adopted.The final loss function was mainly composed of the combination of single-view and cross-view reconstruction losses,in which the single-view reconstruction loss was composed of discriminant and convergence losses based on the reconstruction loss,and the cross-view reconstruction loss was composed of eigenmetric and photometric losses.Then,the dataset was made,and the parts of the KITTI public dataset with more person scenes were selected for dataset making,including 4,721 images in the training set,631 images in the verification set,and 584 images in the test set.Model comparison experiments were conducted to verify the effectiveness of the improvement strategy.[Results]The improved model with the GAM global attention mechanism module is called G-FeatDepth,and after comparative experiments on the dataset,the improved G-FeatDepth model is compared with the improved FeatDepth model on each evaluation index,with absolute relative error decreased by 0.007,square relative error decreased by 0.051,root-mean-square error decreased by 0.032,and root-mean-square logarithmic error decreased by 0.005,as well as accuracy with thresholds δ<1.25 improved by 0.009,δ<1.252 improved by 0.004,and δ<1.253 improved by 0.002,which not only reduces the error index but also improves the accuracy.The experimental data verifies the improvement of the model performance.According to the actual inference effect of the model in the dataset,the G-FeatDepth model has a better depth prediction effect in low-texture areas and details than the other models.[Conclusions]Using the image data in the football scene and comparing the inference effects of each model in the football scene,the improved model G-FeatDepth has a better depth prediction effect in the details of low-texture areas(e.g.,football,goals,and limbs),that is,it is more satisfying to predict the depth information in the football scene,and the unsupervised monocular depth estimation model is applied to the football sports scene.
football sports scenesunsupervised monocular depth estimationFeatDepthattention mechanismGAMimage reconstruction