POLLEN-BASED CLIMATE RECONSTRUCTION USING MACHINE LEARNING ON THE QINGHAI-TIBETAN PLATEAU
Pollen data is one of the most important proxies of past climate change,and many studies on the quantitative reconstruction of past climate change have been accumulated in China.However,very little information is available on the reliability of machine learning methods in developing pollen-climate models in China,and the comparison of robustness among different machine learning methods is rare.In this study,a total of 1801 modern pollen assemblages from the Qinghai-Tibetan Plateau and its surrounding region were adopted to develop climate reconstruction models applying three machine learning methods(random forest,boosted regression tree,and artificial neural network).Two traditional methods(modern analogue technique and weighted average partial least square)were also applied to the same pollen dataset for comparison.The elevations of modern pollen sites ranged from-154m to 5231 m a.s.1.(mainly 1000~4800 m a.s.l.),and they were distributed in various vegetations including tropical rain forest,subtropical evergreen broadleaf forest,subtropical evergreen-deciduous broadleaf forest,warm temperate deciduous forest,subtropical mountain conifer forest,alpine shrubland,alpine meadow,alpine steppe,temperate steppe,and warm temperate/temperate desert.The sample types of modern pollen assemblages included topsoil(831 samples),moss polster(619 samples),and surface lake sediment(351 samples).The original 504 pollen types were transformed into 230 combined pollen types by merging the synonyms and incorporating the low-level taxonomic groups,and 139 combined pollen types which occurred in at least 3 sites with a minimum percentage of 2%were adopted to establish climate reconstruction models.The original pollen dataset was randomly divided into a training set to fit the climate reconstruction models and a test set to evaluate the performance of the models.Climate factors of the modern pollen sites were interpolated by using a thin plate spline regression based on the 30-year(1981~2010)climate data from the meteorological stations of China.The mean warmest month precipitation and the mean coldest month temperature were selected as the most important climate factors in the study region for further analysis.Different methods were used to establish climate reconstruction models based on both the modern pollen percentage data and the affinity scores of plant functional types that were calculated from pollen percentages.The performances of the established models were evaluated by comparing the model predictions and the observed climate data of the pollen sites.The results showed that all five methods performed well in developing the reconstruction models of the mean warmest month precipitation(RMSE<31 mm and R2>0.80)and the mean coldest month temperature(RMSE<4.4 ℃ and R2>0.70)based on both pollen percentage and plant functional type data.Among the three machine learning methods,random forest and boosted regression tree showed a generally similar performance,and produced highly reliable models.In addition,the models of plant functional type-mean coldest month temperature produced by random forest and boosted regression tree showed a better performance compared to other methods.The traditional weighted average partial least square had the overall weakest performance among the five methods.The modern analogue technique,another traditional method,was the most promising method for fitting pollen-climate and plant functional type-climate models in most cases.However,the modern analogue technique is sensitive to autocorrelation,and the assessment of its robustness based on modern pollen data is always over-optimistic.Machine learning methods such as those applied in this study have great potential to be used in reconstructing past climate change on the Qinghai-Tibetan Plateau,especially random forest and boosted regression tree.Our results suggested that the reliability of different methods may vary among regions and climate factors;therefore,the between-method comparison is necessary for selecting the most robust method to establish the pollen-climate model.