融合非结构化文本信息的中小企业信用风险预测研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：基于融合年报文本和新闻报道非结构化文本信息的指标体系,开展中小企业信用风险预测研究.采用递归特征消除方法筛选原始指标,并融入中小企业的年报文本复杂性、年报情感语调和新闻情绪极性等指标;基于贝叶斯优化的XGBoost(BO-XGBoost)等方法,比较在不同特征属性集上多种机器学习模型的信用风险预测性能;使用SHAP(SHapley additive explanations)可解释性方法对模型进行可视化的局部解释和全局解释.研究结果显示,加入了非结构化文本特征指标后模型的性能均有不同程度提升,即这些特征对中小企业信用风险具有良好的预测作用;BO-XGBoost相较Baseline预测性能更优,且非结构化文本特征重要性排序前列;使用SHAP瀑布图、散点图、依赖图解释了误判样例原因、特征对模型输出的影响极性及程度,以及非结构化文本特征与信用风险间的演化趋势,并基于委托-代理等理论进一步完善实证结论的理论支撑.

外文标题：Research on Credit Risk Prediction for Small and Medium-Sized Enterprises by Integrating Unstructured Textual Information

外文摘要：This study focuses on the prediction of credit risk for small and medium-sized enterprises(SMEs)by leveraging a comprehensive indicator system that incor-porates unstructured textual information such asannual report texts and news reports.The Recursive Feature Elimination(RFE)method is utilized to select original indica-tors and indicators such as annual report text complexity,annual report sentiment ten-dency and news sentiment polarity for SMEs are incorporated.By utilizing Bayesian optimization-based XGBoost(BO-XGBoost)and other methodologies,the predictive performance of various machine learning models is compared across different sets of feature attributes.Furthermore,the SHAP(SHapley Additive exPlanations)inter-pretability method is employed to provide visual and comprehensive explanations of the model at both the local and global levels.The research demonstrates that the inclusion of unstructured textual feature indicators significantly enhances the pre-dictive performance of the models,thereby highlighting the valuable predictive role of these features in assessing credit risk for SMEs.BO-XGBoost outperforms the baseline prediction performance,and the unstructured textual features rank highly in terms of importance.The SHAP waterfall plot,scatter plot,and dependence plot are used to explain the reasons for misjudgment cases,the polarity and degree of features impact on model's output,the evolutionary trends between unstructured textual fea-tures and credit risk.The empirical conclusions are further theoretically supported by principal-agent theory and other theories.

外文关键词：

Corporate credit risk predictionannual reportnews sentimentSMEsBO-XGBoostSHAP

作者：

孟祥俊、陈进东、张健

展开 >

作者单位：

北京信息科技大学经济管理学院,北京 100192

智能决策与大数据应用北京市国际科技合作基地,北京 100192

关键词：

信用风险预测年报文本新闻情绪中小企业 BO-XGBoost SHAP

基金：

国家重点研发计划北京市市属高等学校优秀青年人才培育计划国家自然科学基金面上项目

项目编号：

2019YFB1405303BPHR20220323372174018

出版年：

2024

DOI：

10.12341/jssmsKSS23873

系统科学与数学

中国科学院数学与系统科学研究院

系统科学与数学

CSTPCD北大核心

影响因子：0.425

ISSN：1000-0577

年,卷(期)：2024.44(6)

参考文献量41