Study on forest fire prediction based on data set reconstruction and logistic regression
In this study,Huma County in the Daxinganling Region was taken as an example to demonstrate various data balancing methods,including oversampling,undersampling,mixed sampling,and data synthesis.These methods were employed to reconstruct the forest fire dataset before fitting a binomial logistic regression model separately for each method.The data balancing method that yielded the best average prediction effect was selected for reconstructing and modeling the full sample dataset.Additionally,the standardized regression coefficient method was utilized to examine the relative importance of each meteorological factor.The results indicated that there was no significant difference in terms of goodness of fit among oversampling,mixed sampling,and data synthesis methods;however,all three methods exhibited slightly better performance than the undersampling approach.Regarding prediction accuracy,the oversampling data set>mixed sampling data set>synthetic data set>undersampling data set,among which the first three had little difference,but were significantly larger than the last one.The standardized regression coefficients for each meteorological factor were FFMC,daily minimum relative humidity,and DC,with the largest absolute values of 0.771,0.720,and 0.527,respectively.The balanced data set obtained by oversampling,mixed sampling,and data synthesis had better fitting effects,among which the oversampling data set had the highest average prediction accuracy,and can be used to reconstruct the data sets of forest fires and establish models in this area.Among the meteorological factors,FFMC,daily minimum relative humidity,and DC were the main driving meteorological factors affecting the occurrence of forest fires in the study area.
Forest fireMeteorological factorsData set reconstructionLogistic model