A study of sample dependence and spatial extrapolation of models for crop remote sensing classification
Reducing reliance on in situ crop type samples is critical for remotely sensed crop type classification over large areas.This study used Suihua,a major grain-producing city in Heilongjiang Province,as an example to investigate the effect of sample size on crop type classification and test the possibility of extrapolating supervised classification models trained on a small region onto a larger area.In particular,this study trained the crop type classification model in Beilin District and then extrapolated it to the entire Suihua.First,a parameter-optimized random forest model was trained and used to identify the spatial distribution of crops in Beilin District in 2022 by using Sentinel-2 remote sensing imagery from the sowing to the mid-tasseling of maize.Overall Accuracy(OA)gradually increased as the proportion of samples participating in the random forest training increased from 10%to 50%of the Gaussian Variate Generator(GVG)samples in Beilin District.The model achieved the best performance with a maximum OA of 94.6%when 50%of the GVG samples in Beilin District were used for crop classification,where maize,rice,and soybean had approximately 130 training samples.Thereafter,the performance of the model remained stable even as the number of in situ crop samples increased.The most important features in the classification of maize,soybean,and rice were REP at the tassel stage of maize,shortwave-infrared(SWIR)1 at the pod stage of soybean,and the Land Surface Water Index(LSWI)during the transplanting stage of rice.Second,we extrapolated the best trained model in Beilin District to classify crop types in the entire Suihua.The model extrapolation achieved an OA of 93.7%for crop type classification in Suihua.This value was only 1.3%lower than that of the model trained directly in Suihua.The similarity of the spatial and probability distribution maps of the crops between the Beilin and Suihua models indicated that the extrapolation of the crop classification model in a small area can achieve a comparable classification result with the crop classification model trained directly in a large area.Finally,we carefully examined the effects of distance,spatial representativeness and number of samples,and similarity of crop structure between small area and target expansion area on model extrapolation.Different crops exhibit varying sensitivities to distance,and the classification effect of rice is insensitive to changes in distance due to the significant differences between the LSWI and SWIR1 of rice and other crops.Meanwhile,the classification effects of maize and soybean exhibit an overall decreasing trend of change with increasing extrapolation distance.In summary,when building crop classification models in small regions with similar crop structures in the source and target areas,not only the number of samples should be considered,but also the representativeness of their spatial distribution.Such consideration will ensure that the model is adequately trained and can achieve better spatial extrapolation effect.The results of this study provide a cost-effective and efficient method for accurately classifying crops over large areas by using remote sensing.In addition,this study provides a scientific basis for developing crop sampling strategies,selecting sensitive bands,and determining the classification time window.It is also a valuable reference for the development of model extrapolation methods with higher robustness and generalizability.