Crop type classification of remote sensing image time series based on multi-scale spatial-temporal global attention model
With the development of deep learning,the use of deep learning methods to obtain accurate crop classification results from remote sensing image time series has become a research hotspot.The automatic intelligent interpretation of fine types of crops by utilizing remote sensing image time series plays an important role in the fields of agricultural resource investigation,supervision,and planning.The classical time series classification of remote sensing images is based on pixel-based classification,and only the temporal information of the time series is utilized.The spatial information of the shape,size,and distribution of ground objects in the time series of remote sensing images also plays an important role in the classification crops,so it is beneficial to extract the hybrid spatial-temporal features of the time series by fully mining the spatial—temporal information of the time series.However,existing deep learning methods extract local spatial or local temporal information by using convolutional or recurrent neural networks,resulting in the inadequate utilization of spatial-temporal information,and consequently,low classification accuracy.In recent years,whether in the field of Natural Language Processing or Computer Vision,the self-attention mechanism has proven to be an effective method to fully utilize data information by attaining global attention.Thus,in this paper,we propose a multiscale spatial-temporal global attention model(MSSTGAM),which combines a spatial self-attention mechanism and a temporal self-attention mechanism to construct a multiscale spatial-temporal global attention mechanism and fully obtain the information of the remote sensing image time series for the fine classification of crop types.Specifically,MSSTGAM adopts SWIN Transformer to process the spatial information of remote sensing image time series to obtain output at different spatial scales,and uses lightweight temporal attention encoder(LTAE)to obtain spatial-temporal global features at the deepest spatial scale,and shares the temporal attention weights to other spatial scale through the temporal sharing block to obtain multi-scale spatial-temporal global attention features for fine classification of crop type.The proposed method is evaluated on the publicly available dataset PASTIS and customized Mississippi dataset.The overall classification accuracy of 83.4%and 86.7%was obtained on the two datasets,respectively.Moreover the proposed method achieves the best Fl scores in most crop types,especially for wheat crops,which have an improvement of 2.6%and 3.3%over existing methods on the two datasets,respectively.The quantitative results demonstrate the effectiveness and application value of MSSTGAM for fine classification of crop type.The visualization of the classification results shows that the classification results of the proposed method have better spatial consistency,and the further visual analysis of temporal attention weights points out the theoretical basis for the proposed method to obtain fine classification of crops.The findings of this study show that multiscale spatial-temporal global attention demonstrates significant theoretical and practical significance.MSSTGAM can capture the global spatial-temporal evolution of land cover,which is conducive to improving the spatial consistency and classification accuracy of fine crop types.It is more effective for the fine classification of crop types from remote sensing image time series.
remote sensing image time seriescrop type classificationself-attention mechanismglobal attentionmulti-scale spatial-temporal feature