A Stereo Matching Network for Weak Texture Objects
Existing depth estimations have the problems of insufficient feature extraction and poor local feature extraction in high-resolution images.Therefore,a Transformer stereo matching network oriented to global features is proposed.The network adopts an encoder-decoder with end-to-end architecture and multi-head attention mechanism,which allows the model to pay attention to different features in different subspaces,thus improving the feature extraction ability.By combining the self-attention mechanism with the fea-ture reconstruction window,the model can improve the representation ability of features to compensate for the shortage of local fea-tures,and effectively solve the high computational complexity of Transformer architecture,so that the computational complexity of the model is maintained within a linear range.Experiments on the Scene Flow and KITTI-2015 data sets show that compared with the existing methods,the relevant indicators are significantly improved,which verifies the effectiveness and practicability of the model.
depth estimationencoder-decoderself attention mechanismfeature reconstruction windowglobal context information