Most of the current deep learning based image stitching and localization methods are primarily focused on deep-level features with limited receptive fields,thereby overlooking shallow-level features,which adversely affects the accuracy of image stitching and localization.In view of the above,a novel image stitching and localization network UMTransNet which combines an improved U-Net architecture with a multi-scale multi-view Transformer is proposed.The encoder of the U-Net model is enhanced,and the maximum pooling layer of the encoder is replaced with convolutional layers to prevent the loss of shallow-level features.Additionally,the multi-scale multi-view Transformer is embedded into the skip connections of the U-Net,which facilitates the effective fusion of the output features of the Transformer and the upsampled features of the U-Net,so as to achieve a balance between deep-level and shallow-level features,thereby enhancing the accuracy of image stitching and localization.The results of visualization detection graph show that the proposed methed is more excellent in locating stitched tampered regions.
关键词
数字图像取证/图像拼接定位/U-Net/多尺度感知/自注意力机制/交叉注意力机制
Key words
digital image forensics/image stitching localization/U-Net/multi-scale perception/self-attention mechanism/cross-attention mechanism