首页|基于大数据变量最优组合的违约预测模型——以中国小企业为例

基于大数据变量最优组合的违约预测模型——以中国小企业为例

扫码查看
以提升商业银行企业客户信用风险识别和管理水平为目的,本文提出了一种系统的违约预测方法.一是在高维大数据变量集的构建上,通过基尼指数最小,反推出指标区间划分的最优切分点,确保决策树群中的每一个路径能最大限度地区分客户违约与否,并把每一个路径作为一个虚拟变量,即客户属于这个路径则变量数值取1,否则为0.二是在虚拟变量的降维上,通过Lasso回归的违约预测误差最小,反推一组最优的虚拟变量组合.三是以客户判对率之和最高反推逻辑回归模型的最优违约预测临界点,提高了违约企业预测的准确率.实证研究表明,1)决策树路径变量的违约鉴别能力要强于原始信用指标,信息含量更丰富.2)净利润现金含量,城市居民人均可支配收入和企业法律纠纷情况指标对中国小企业的违约预测具有重要影响,这3个指标的个数占比是3.704%,精度贡献占比却达到了 41.639%.3)该方法在准确性和稳健性方面优于对比模型,可以揭示影响企业信用风险的关键因素和关键阈值,为商业银行授信审批和贷前审查工作提供依据.且该方法可以拓展到个人以及大、中型企业违约预测模型的构建.
Default prediction model based on optimal combination of big data variables:A case study of Chinese small enterprises
To enhance the identification and management of credit risk for commercial bank corporate customers,we present a systematic approach for predicting default.Firstly,in con-structing high-dimensional big data variable sets,we determine the optimal cutoff point for dividing the indicator interval by minimizing the Gini index.This ensures that each path of the decision tree maximizes the distinction between customer default and non-default.We treat each path as a dummy variable,where the value of the variable is 1 if the customer belongs to this path,otherwise,it is 0.Secondly,to reduce the dimensionality of dummy variables,we utilize Lasso regression to minimize prediction error and infer the optimal set of variables.Thirdly,we calculate the optimal default prediction threshold of the logistic regression model with the highest sum of customer judgment ratio,which improves the accuracy of default firm prediction.Our results show that decision tree path variables have stronger default discriminatory power than raw credit indicators and contain richer information.Additionally,the indicators of net profit cash content,per capita disposable income of urban residents,and legal dispute situation of enterprises have a significant impact on the default prediction of Chinese small enterprises.Although these three indicators represent only 3.704%of the total number of indicators,their contribution to accuracy is 41.639%.Our proposed methodology outperforms the comparison model in terms of accuracy and robustness.It can unveil the key factors and thresholds that affect the credit risk of enterprises,thus providing a basis for commercial banks'credit approval and pre-loan review work.The methodology's effectiveness has been proven across multiple credit datasets,and it can be extended to constructing default prediction models for individuals as well as large and medium-sized enterprises.

default predictionbig datadecision tree path variablesLassodefault prediction threshold

沈隆、周颖

展开 >

大连理工大学经济管理学院,大连 116024

违约预测 大数据 决策树路径变量 Lasso 违约预测临界点

国家自然科学基金面上项目辽宁省社会科学规划基金辽宁省社科联一般项目

72071026L21BGL0112023lslybkt-025

2024

系统工程理论与实践
中国系统工程学会

系统工程理论与实践

CSTPCDCSSCI北大核心
影响因子:1.575
ISSN:1000-6788
年,卷(期):2024.44(3)
  • 40