基于RF-FL-LightGBM算法的信用风险评估模型研究

Research on Credit Risk Evaluation Model Based on RF-FL-LightGBM Algorithm

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：为了解决大数据环境下高维度稀疏的客户信用特征以及样本不平衡问题,从而提高客户的信用评估准确度,论文提出了基于RF-FL-LightGBM算法的信用风险评估模型.首先利用随机森林(RF)对高维数据进行重要性排序和筛选,剔除容易引起模型过度拟合和冗余无效的特征;其次将基于Focal Loss函数改进后的二分类平衡交叉嫡损失函数(FL)作为LightGBM模型的损失函数,以此改善正负样本不平衡导致模型准确度降低的情况,从而提高模型的分类性能.使用某金融租赁公司的历史客户数据集进行实验,结果表明,RF-FL-LightGBM模型的F1值、AUC值都明显高于XGBoost和LigthGBM模型.RF-FL-LightGBM算法不仅有效处理了高维稀疏不平衡样本数据,还提高了客户属性的分类精确度且执行效率更高.

外文摘要：In order to solve the problem of high-dimensional sparse customer credit characteristics and sample imbalance in the big data environment,thereby improving the accuracy of customer credit evaluation,this paper proposes a credit risk evaluation model based on the RF-FL-LightGBM algorithm.First,random forest(RF)is used to sort and filter the importance of high-dimen-sional features to eliminate features that easily lead to model overfitting and redundant uselessness.Secondly,the two-category bal-anced cross-straight loss function(FL)is improved based on the Focal Loss function.As the loss function of the LightGBM model to improve the model accuracy due to the positive and negative samples imbalance,thereby improving the model classification perfor-mance.Experiments use the historical customer data set of a financial leasing company.The results show that the F1-Score and AUC of the RF-FL-LightGBM model are significantly higher than the XGBoost and LigthGBM models.The RF-FL-LightGBM algo-rithm not only effectively processes high-dimensional sparse and unbalanced sample data,but also improves the customer attributes classification accuracy and has higher execution efficiency.

外文关键词：

credit risk assessmentrandom forestfeature selectionFocal LossLightGBM algorithm

作者：

苗月、吴陈

展开 >

作者单位：

江苏科技大学计算机学院镇江 212000

关键词：

信用风险评估随机森林特征选取 Focal Loss LightGBM算法

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.03.030

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(3)

参考文献量5