摘要
为了发展风沙跃移轨迹追踪算法,需要建立自动识别跃移轨迹方法来代替人工识别方法.本文利用自建的跃移轨迹数据集,发展了4个优化的集成学习模型(极度随机树、随机森林、梯度提升决策树和XGBoost)以实现跃移轨迹的自动识别.结果表明:上述模型都能够较好地实现跃移轨迹的自动分类,反映了集成学习模型在解决这类非线性问题时的独特优势;在研究的模型中,极度随机树模型具有最高的准确率(0.9035)、精确度(0.9030)、召回率(0.9035)、F1分数(0.8995)、MCC(0.7378)、AUC分数(0.9179)和最高的时间成本;XGBoost模型具有较好的预测分数和较低时间成本;前者适合用于离线跃移轨迹的自动识别而后者具有在线追踪风沙颗粒的潜力;添加瞬时水平和垂直速度的方差等参数化方案不但可优化数据集,且能进一步提升极度随机树模型的预测性能.
Abstract
It is very vital for tracking sand particle to establish automatic identification of saltating tracks.Thus,the four ensemble models,including the Extremely randomized trees,the Random forests,the XGBoost,and the Gradient Boosting Decision Tree driven by the datasets we constructed,were proposed for identifying saltat-ing tracks.Firstly,all the models perform well in spite of the dataset without very good discriminability,suggest-ing these models own an advantage when dealing with nonlinear relationships.Secondly,the Extremely random-ized trees model holds the highest accuracy(0.9035),precision(0.9030),recall(0.9035),F1 score(0.8995),MCC(0.7378),and AUC score(0.9179),and time cost while the XGBoost model has the best balance between the higher scores and lower time cost.It implies that the former is most feasible for identifying offline saltating tracks and that the latter is prospective for tracking sand particle online.Finally,the improved datasets,which in-corporate standard deviation of instant horizontal and vertical velocities,significantly enhance the predictive per-formances of Extremely randomized trees.This study effectively reduces the time cost of manual trajectory verifi-cation and broadens the application of machine learning in saltation.