A new training data sampling method for machine learning-based landslide susceptibility mapping
Training samples play an important role in machine learning-based regional landslide susceptibility evaluation.These samples consist of both landslide(positive)and nonlandslide(negative)samples collected through various sampling methods.However,existing methods for positive sample collection do not measure the reliability of the collected samples,leading to uncertainty in terms of reliability.To address this issue,this paper presents a landslide prototype sampling method(PBS).This method uses the geographical similarity and dissimilarity between a certain point and the landslide positive sample prototype to measure the reliability of positive and negative samples,respectively.A reliability threshold is set based on a mutual exclusion method to collect training samples.The Youfanggou Basin in Gansu province was chosen as the research area.The PBS and existing representative sampling methods were used to construct landslide susceptibility prediction models based on logistic regression,support vector machines,and random forests for the Youfanggou Basin.The evaluation effects of landslide susceptibility were compared between the reliable and nonreliable samples.The reliability of the positive and negative samples exhibited a"fluctuating increase"and"positive correlation",respectively,in the evaluation of landslide susceptibility.The PBS method improved the accuracy and area under the receiver operating characteristic curve(AUC)of the landslide susceptibility evaluation based on the three machine learning models by at least 14.7%and 14%,respectively,compared to the existing representative sampling methods,and the standard deviation was small,which indicates that the method proposed in this article is effective.