Synthetic aperture radar image ship classification based on ViT-CNN hybrid network
In recent years,vision transformer(ViT)has made significant breakthroughs in the field of image classifi-cation.However,it is difficult to adapt to the task of synthetic aperture radar image ship classification due to its lack of multiscale and local feature capture capability.For this reason,this paper proposes a hybrid network model for synthetic aperture radar image ship classification.A staged downsampling network structure is designed to solve the problem that ViT is unable to capture multi-scale features.By incorporating the convolutional structure into three core modules of the ViT model,three modules,namely,convolutional token embedding,convolutional parameters sharing attention,and local feed-forward network,are designed,which enable the network to capture both global and local features of the ship images,and further enhance the network's inductive biasing and feature extraction ability.Exper-imental results show that the proposed model in this paper improves the classification accuracy by 2.96%and 4.18%compared with the existing optimal method on two generalized SAR ship image datasets,OpenSARShip and FUSAR-Ship,respectively,which effectively improves the performance of SAR image ship classification.