首页|基于Focal Loss改进LightGBM的供水管网毛刺数据检测

基于Focal Loss改进LightGBM的供水管网毛刺数据检测

扫码查看
针对数据不平衡导致的管网毛刺数据检测召回率偏低问题,提出一种Focal Loss改进LightGBM的管网毛刺数据检测方法.首先,结合管网毛刺数据的特点,针对性构造邻域相关特征.其次,将Focal Loss函数引入LightGBM,提高模型对难以检测的毛刺样本的权重,并对Focal Loss不同的参数取值进行实验,以平衡精确率与召回率.最后,选择不同参数的Focal Loss进行模型融合,进一步提升模型对不平衡毛刺数据的检测性能.在某市供水管网的真实数据上进行实验,结果表明,对比基于交叉熵损失函数的单一模型,本文提出的Focal Loss改进后的融合模型在毛刺数据上召回率和F1值的提升幅度达33.3和18个百分点,但毛刺数据的精确率还有待进一步提升.本文所提方法从损失函数入手,动态调整难易样本的权重,有效地提升了不平衡数据下的毛刺数据的检测性能.
Water Supply Pipeline Burr Data Detection Based on Improved LightGBM by Focal Loss
Addressing the issue of low recall in the detection of burrs in water supply pipelines due to data imbalance,this paper proposes an improved method for detecting pipeline burr data by utilizing the Focal Loss function and integrating it with Light-GBM.Firstly,considering the characteristics of pipeline burr data,neighborhood-related features are constructed.Secondly,the Focal Loss function is introduced into LightGBM to increase the model's weight on hard-to-detect burr samples.Different pa-rameter values for Focal Loss are experimented to balance precision and recall.Finally,different parameter settings for Focal Loss are selected for model fusion to further improve the detection performance of the model on imbalanced burr data.Experi-ments are carried out on a real dataset from a municipal water supply pipeline.The experimental results show that,compared with a single model based on the cross-entropy loss function,the fused model with the improved Focal Loss in this paper achieves 33.3 percentage points increase in recall and 18 percentage points increase in F1 score for burr data.However,the pre-cision of burr data detection still needs further improvement.The method proposed in this paper starts with loss function and dy-namically adjusts the weights of difficult and easy samples to effectively improve the detection performance of burr data under un-balanced data.

anomaly detectionFocal LossLightGBMimbalanced databurr data

薛浩、马静、郭小宇

展开 >

南京航空航天大学经济与管理学院,江苏 南京 211106

异常检测 Focal Loss LightGBM 不平衡数据 毛刺数据

国家自然科学基金面上项目

72174086

2024

计算机与现代化
江西省计算机学会 江西省计算技术研究所

计算机与现代化

CSTPCD
影响因子:0.472
ISSN:1006-2475
年,卷(期):2024.(9)
  • 9