机器学习算法在青藏高原孢粉-气候定量重建中的尝试

POLLEN-BASED CLIMATE RECONSTRUCTION USING MACHINE LEARNING ON THE QINGHAI-TIBETAN PLATEAU

秦锋 ¹赵艳²

扫码查看

作者信息

1. 中国科学院地理科学与资源研究所,陆地表层格局与模拟院重点实验室,北京 100101
2. 中国科学院地理科学与资源研究所,陆地表层格局与模拟院重点实验室,北京 100101;中国科学院大学,北京 100049
折叠

摘要

孢粉是定量重建过去气候变化的重要代用指标,在中国已开展了大量基于孢粉数据的气候定量重建研究.然而,只有少数研究采用机器学习算法来建立孢粉-气候重建模型,不同机器学习算法的可靠性尚需验证.本研究采用3种机器学习算法(包括随机森林、增强回归树和人工神经网络算法),基于青藏高原及其周边地区现代孢粉百分比数据,及由其计算而来的植物功能型得分数据,建立气候定量重建模型;同时用两种常规方法(现代类比法和加权平均偏最小二乘法)建立的模型作为对比,检验机器学习算法基于孢粉数据开展气候定量重建的可靠性.结果显示5种方法基于孢粉百分比和植物功能型得分数据建立的最暖月平均降水量和最冷月平均温度重建模型,从均方根误差(RMSE)和决定系数(R2)两个评价指标来看,都有较高可靠性.3种机器学习算法中,随机森林和增强回归树的可靠性比人工神经网络算法高,二者建立的模型性能非常接近,尤其在建立植物功能型-最冷月平均温度重建模型时,二者表现是所有方法中最好的;常规方法中,加权平均偏最小二乘法在大多数情况下表现明显不及3种机器学习算法,而现代类比法的可靠性稍优于3种机器学习算法,但考虑该方法对空间自相关较为敏感,其表现有高估的风险.本研究所用的机器学习算法极有潜力用于青藏高原地区的古气候定量重建,尤其是随机森林和增强回归树算法.针对特定地区特定气候因子,不同方法基于孢粉数据建立气候重建模型的适用性可能不同,开展具体工作时有必要对比多种方法,选取其中可靠性最高的方法建立孢粉-气候模型.

Abstract

Pollen data is one of the most important proxies of past climate change,and many studies on the quantitative reconstruction of past climate change have been accumulated in China.However,very little information is available on the reliability of machine learning methods in developing pollen-climate models in China,and the comparison of robustness among different machine learning methods is rare.In this study,a total of 1801 modern pollen assemblages from the Qinghai-Tibetan Plateau and its surrounding region were adopted to develop climate reconstruction models applying three machine learning methods(random forest,boosted regression tree,and artificial neural network).Two traditional methods(modern analogue technique and weighted average partial least square)were also applied to the same pollen dataset for comparison.The elevations of modern pollen sites ranged from-154m to 5231 m a.s.1.(mainly 1000～4800 m a.s.l.),and they were distributed in various vegetations including tropical rain forest,subtropical evergreen broadleaf forest,subtropical evergreen-deciduous broadleaf forest,warm temperate deciduous forest,subtropical mountain conifer forest,alpine shrubland,alpine meadow,alpine steppe,temperate steppe,and warm temperate/temperate desert.The sample types of modern pollen assemblages included topsoil(831 samples),moss polster(619 samples),and surface lake sediment(351 samples).The original 504 pollen types were transformed into 230 combined pollen types by merging the synonyms and incorporating the low-level taxonomic groups,and 139 combined pollen types which occurred in at least 3 sites with a minimum percentage of 2％were adopted to establish climate reconstruction models.The original pollen dataset was randomly divided into a training set to fit the climate reconstruction models and a test set to evaluate the performance of the models.Climate factors of the modern pollen sites were interpolated by using a thin plate spline regression based on the 30-year(1981～2010)climate data from the meteorological stations of China.The mean warmest month precipitation and the mean coldest month temperature were selected as the most important climate factors in the study region for further analysis.Different methods were used to establish climate reconstruction models based on both the modern pollen percentage data and the affinity scores of plant functional types that were calculated from pollen percentages.The performances of the established models were evaluated by comparing the model predictions and the observed climate data of the pollen sites.The results showed that all five methods performed well in developing the reconstruction models of the mean warmest month precipitation(RMSE＜31 mm and R2＞0.80)and the mean coldest month temperature(RMSE＜4.4 ℃ and R2＞0.70)based on both pollen percentage and plant functional type data.Among the three machine learning methods,random forest and boosted regression tree showed a generally similar performance,and produced highly reliable models.In addition,the models of plant functional type-mean coldest month temperature produced by random forest and boosted regression tree showed a better performance compared to other methods.The traditional weighted average partial least square had the overall weakest performance among the five methods.The modern analogue technique,another traditional method,was the most promising method for fitting pollen-climate and plant functional type-climate models in most cases.However,the modern analogue technique is sensitive to autocorrelation,and the assessment of its robustness based on modern pollen data is always over-optimistic.Machine learning methods such as those applied in this study have great potential to be used in reconstructing past climate change on the Qinghai-Tibetan Plateau,especially random forest and boosted regression tree.Our results suggested that the reliability of different methods may vary among regions and climate factors;therefore,the between-method comparison is necessary for selecting the most robust method to establish the pollen-climate model.

关键词

孢粉/气候定量重建/机器学习算法/青藏高原

Key words

pollen/quantitative climate reconstruction/machine learning/Qinghai-Tibetan Plateau

引用本文复制引用

基金项目

国家重点研发计划项目(2022YFF0801504)

国家重点研发计划项目(2022YFF0801501)

国家自然科学基金项目(42071114)

国家自然科学基金项目(42277454)

国家自然科学基金项目(42242104)

出版年

2024

第四纪研究

中国科学院地质与地球物理研究所　中国第四纪研究委员会

第四纪研究

CSTPCDCSCD北大核心

影响因子：2.939

ISSN：1001-7410

参考文献量59

段落导航