Detecting Depression Factors with Gradient Boosting Tree and Explainable Machine Learning Model SHAP
[Objective]This study constructs a predictive model for depression severity and explores its interpretability issues.We aim to improve the automated depression detection model's reliability and practicality by analyzing Internet user-generated content.[Methods]First,we built a corpus by collecting depression-related medical consultations from the Good Doctor Online platform.Then,we extracted patients'psychological features using C-LIWC,a psychology lexicon.Third,we predicted the patients'conditions with the Gradient Boosting Tree algorithm.The study also incorporated the explainable machine learning method SHAP to interpret the new model.Through SHAP's unique visualizations,we analyzed the complex relationship between patients'age,gender,cognition,emotions,perceptions,social/family contexts,personal gains or losses,and the occurrence of depression.[Results]The psychological state of depression patients provided feedback on their condition.Utilizing psychological features extracted from consultation records effectively detected severe depression,with an accuracy of 86%.The SHAP reveals multiple effects of patients'psychological features on depression.[Limitations]Limited by the corpus,predictions of depression severity were based only on single consultation records.Additionally,the model features were based on psychological dictionaries,while more elements related to the risk of depression could be included in the future.[Conclusions]Factors influencing the occurrence and development of depression are complex.Individual differences result in different effects of various characteristics on disease prediction.Building an automated diagnostic model for depression should focus on the model's accuracy and enhance understanding of the model's predictions.