基于数据集重构与Logistic回归的林火预测研究

Study on forest fire prediction based on data set reconstruction and logistic regression

白海峰 ¹牛树奎 ²陈锋 ²梁丽壮³

扫码查看

作者信息

1. 山东警察学院,济南,250200
2. 北京林业大学生态与自然保护学院,北京,100083
3. 中咨集团生态技术研究所北京有限公司,北京,100048
折叠

摘要

以大兴安岭地区呼玛县为例,采用过采样、欠采样、混合采样以及人工合成等数据平衡方法对林火数据集进行重构,然后分别应用二项Logistic回归模型进行拟合.选择平均预测效果最佳的数据平衡方法,对全样本数据进行重构和建立模型,同时利用标准化回归系数法对各气象因子的相对重要性进行研究.结果显示,综合随机划分的 10 个样本共计 40 个平衡数据集,就拟合优度来说,过采样、混合采样和人工合成数据集差别不大,三者略大于欠采样数据集;就预测精度来说,过采样数据集﹥混合采样数据集﹥人工合成数据集﹥欠采样数据集,其中前三者差别不大,但明显大于欠采样数据集.计算各气象因子的标准化回归系数结果,FFMC、日最小相对湿度、DC绝对值最大,分别为 0.771、0.720、0.527.本文通过过采样、混合采样及人工合成数据方法所得平衡数据集的拟合效果较好,其中以过采样数据集平均预测准确率最高,可用于该地区林火数据集重构及建立模型,各气象因子中,FFMC、日最小相对湿度、DC是影响研究区林火发生的主要驱动气象因子.

Abstract

In this study,Huma County in the Daxinganling Region was taken as an example to demonstrate various data balancing methods,including oversampling,undersampling,mixed sampling,and data synthesis.These methods were employed to reconstruct the forest fire dataset before fitting a binomial logistic regression model separately for each method.The data balancing method that yielded the best average prediction effect was selected for reconstructing and modeling the full sample dataset.Additionally,the standardized regression coefficient method was utilized to examine the relative importance of each meteorological factor.The results indicated that there was no significant difference in terms of goodness of fit among oversampling,mixed sampling,and data synthesis methods;however,all three methods exhibited slightly better performance than the undersampling approach.Regarding prediction accuracy,the oversampling data set>mixed sampling data set>synthetic data set>undersampling data set,among which the first three had little difference,but were significantly larger than the last one.The standardized regression coefficients for each meteorological factor were FFMC,daily minimum relative humidity,and DC,with the largest absolute values of 0.771,0.720,and 0.527,respectively.The balanced data set obtained by oversampling,mixed sampling,and data synthesis had better fitting effects,among which the oversampling data set had the highest average prediction accuracy,and can be used to reconstruct the data sets of forest fires and establish models in this area.Among the meteorological factors,FFMC,daily minimum relative humidity,and DC were the main driving meteorological factors affecting the occurrence of forest fires in the study area.

关键词

森林火灾/气象因子/数据集重构/逻辑斯蒂模型

Key words

Forest fire/Meteorological factors/Data set reconstruction/Logistic model

引用本文复制引用

基金项目

山东警察学院科研培育专项(YKYPYZX202305)

出版年

2024

火灾科学

中国科学技术大学

火灾科学

北大核心

影响因子：0.507

ISSN：1004-5309

参考文献量7

段落导航