Existing depth estimations have the problems of insufficient feature extraction and poor local feature extraction in high-resolution images.Therefore,a Transformer stereo matching network oriented to global features is proposed.The network adopts an encoder-decoder with end-to-end architecture and multi-head attention mechanism,which allows the model to pay attention to different features in different subspaces,thus improving the feature extraction ability.By combining the self-attention mechanism with the fea-ture reconstruction window,the model can improve the representation ability of features to compensate for the shortage of local fea-tures,and effectively solve the high computational complexity of Transformer architecture,so that the computational complexity of the model is maintained within a linear range.Experiments on the Scene Flow and KITTI-2015 data sets show that compared with the existing methods,the relevant indicators are significantly improved,which verifies the effectiveness and practicability of the model.
关键词
深度估计/编码器-解码器/自注意力机制/特征重构窗口/全局上下文信息
Key words
depth estimation/encoder-decoder/self attention mechanism/feature reconstruction window/global context information