首页|融合非结构化文本信息的中小企业信用风险预测研究

融合非结构化文本信息的中小企业信用风险预测研究

扫码查看
基于融合年报文本和新闻报道非结构化文本信息的指标体系,开展中小企业信用风险预测研究.采用递归特征消除方法筛选原始指标,并融入中小企业的年报文本复杂性、年报情感语调和新闻情绪极性等指标;基于贝叶斯优化的XGBoost(BO-XGBoost)等方法,比较在不同特征属性集上多种机器学习模型的信用风险预测性能;使用SHAP(SHapley additive explanations)可解释性方法对模型进行可视化的局部解释和全局解释.研究结果显示,加入了非结构化文本特征指标后模型的性能均有不同程度提升,即这些特征对中小企业信用风险具有良好的预测作用;BO-XGBoost相较Baseline预测性能更优,且非结构化文本特征重要性排序前列;使用SHAP瀑布图、散点图、依赖图解释了误判样例原因、特征对模型输出的影响极性及程度,以及非结构化文本特征与信用风险间的演化趋势,并基于委托-代理等理论进一步完善实证结论的理论支撑.
Research on Credit Risk Prediction for Small and Medium-Sized Enterprises by Integrating Unstructured Textual Information
This study focuses on the prediction of credit risk for small and medium-sized enterprises(SMEs)by leveraging a comprehensive indicator system that incor-porates unstructured textual information such asannual report texts and news reports.The Recursive Feature Elimination(RFE)method is utilized to select original indica-tors and indicators such as annual report text complexity,annual report sentiment ten-dency and news sentiment polarity for SMEs are incorporated.By utilizing Bayesian optimization-based XGBoost(BO-XGBoost)and other methodologies,the predictive performance of various machine learning models is compared across different sets of feature attributes.Furthermore,the SHAP(SHapley Additive exPlanations)inter-pretability method is employed to provide visual and comprehensive explanations of the model at both the local and global levels.The research demonstrates that the inclusion of unstructured textual feature indicators significantly enhances the pre-dictive performance of the models,thereby highlighting the valuable predictive role of these features in assessing credit risk for SMEs.BO-XGBoost outperforms the baseline prediction performance,and the unstructured textual features rank highly in terms of importance.The SHAP waterfall plot,scatter plot,and dependence plot are used to explain the reasons for misjudgment cases,the polarity and degree of features impact on model's output,the evolutionary trends between unstructured textual fea-tures and credit risk.The empirical conclusions are further theoretically supported by principal-agent theory and other theories.

Corporate credit risk predictionannual reportnews sentimentSMEsBO-XGBoostSHAP

孟祥俊、陈进东、张健

展开 >

北京信息科技大学经济管理学院,北京 100192

智能决策与大数据应用北京市国际科技合作基地,北京 100192

信用风险预测 年报文本 新闻情绪 中小企业 BO-XGBoost SHAP

国家重点研发计划北京市市属高等学校优秀青年人才培育计划国家自然科学基金面上项目

2019YFB1405303BPHR20220323372174018

2024

系统科学与数学
中国科学院数学与系统科学研究院

系统科学与数学

CSTPCD北大核心
影响因子:0.425
ISSN:1000-0577
年,卷(期):2024.44(6)
  • 41