Multiview Stereo Reconstruction with Feature Aggregation Transformer
In this study,a multiview stereo network reconstruction method based on a feature aggregation transformer was proposed to address the problem of blurred matching in areas with weak textures or non-Lambertian surfaces.This is caused by the lack of understanding of the overall image and connections between images in existing multiview stereo methods.Initially,the input image extracted features by fusing deformable convolutional feature pyramid networks.Further,the size and shape of the receptive field were adaptively adjusted.Subsequently,a Transformer-based spatial aggregation module was introduced to capture the texture features of scenes more accurately for feature aggregation using the intra-image self-attention mechanism.This yielded the intra-view global contextual information and inter-image cross-attention mechanism to efficiently obtain inter-view information interactions,thereby achieving a reliable feature match by capturing the texture features of scenes more accurately.Finally,visibility cost aggregation was employed to estimate pixel visibility information to remove noisy and mismatched pixels from cost aggregation.Experimental results on the DTU and Tanks&Temples datasets show that the proposed method achieves superior reconstruction performance compared with other methods.