Research on Credit Risk Prediction for Small and Medium-Sized Enterprises by Integrating Unstructured Textual Information
This study focuses on the prediction of credit risk for small and medium-sized enterprises(SMEs)by leveraging a comprehensive indicator system that incor-porates unstructured textual information such asannual report texts and news reports.The Recursive Feature Elimination(RFE)method is utilized to select original indica-tors and indicators such as annual report text complexity,annual report sentiment ten-dency and news sentiment polarity for SMEs are incorporated.By utilizing Bayesian optimization-based XGBoost(BO-XGBoost)and other methodologies,the predictive performance of various machine learning models is compared across different sets of feature attributes.Furthermore,the SHAP(SHapley Additive exPlanations)inter-pretability method is employed to provide visual and comprehensive explanations of the model at both the local and global levels.The research demonstrates that the inclusion of unstructured textual feature indicators significantly enhances the pre-dictive performance of the models,thereby highlighting the valuable predictive role of these features in assessing credit risk for SMEs.BO-XGBoost outperforms the baseline prediction performance,and the unstructured textual features rank highly in terms of importance.The SHAP waterfall plot,scatter plot,and dependence plot are used to explain the reasons for misjudgment cases,the polarity and degree of features impact on model's output,the evolutionary trends between unstructured textual fea-tures and credit risk.The empirical conclusions are further theoretically supported by principal-agent theory and other theories.