结合梯度提升树算法与可解释机器学习模型SHAP的抑郁症影响因素研究

Detecting Depression Factors with Gradient Boosting Tree and Explainable Machine Learning Model SHAP

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的]本研究旨在探讨构建抑郁严重度预测模型及其解释性问题,通过分析互联网用户生成的内容,进一步发展抑郁症风险预测研究,从而提高抑郁症自动检测模型的可靠性和实用性.[方法]通过收集"好大夫在线"平台上的抑郁症医疗咨询文本记录,构建了一个语料库.利用心理学词典,从中提取了患者的心理特征,并采用梯度提升树算法预测患者的病情,同时引入可解释机器学习方法SHAP解读模型,借助SHAP独特的可视化图表剖析患者年龄、性别、认知、情感、感知、社会家庭及个人得失与抑郁症发生之间的复杂关系.[结果]抑郁症患者心理状态能反馈患者病况,利用从患者问诊记录中提取的心理特征能够有效检测重度抑郁,准确率达到86％.可解释机器学习模型SHAP解释了模型的预测结果,揭示出患者各层面心理特征对抑郁症发生产生的多重效应.[局限]受语料集所限,仅利用单次问诊记录对抑郁程度做预测;而模型特征基于心理学词典,更多与抑郁症发生风险有关的要素可纳入建模考虑中.[结论]影响抑郁症产生及发展的因素复杂.个体差异致使各项特征对于疾病预测产生不同效应.构建抑郁症的自动诊断模型,不仅要关注模型的精准度,更需增强对模型预测的理解.

外文摘要：[Objective]This study constructs a predictive model for depression severity and explores its interpretability issues.We aim to improve the automated depression detection model's reliability and practicality by analyzing Internet user-generated content.[Methods]First,we built a corpus by collecting depression-related medical consultations from the Good Doctor Online platform.Then,we extracted patients'psychological features using C-LIWC,a psychology lexicon.Third,we predicted the patients'conditions with the Gradient Boosting Tree algorithm.The study also incorporated the explainable machine learning method SHAP to interpret the new model.Through SHAP's unique visualizations,we analyzed the complex relationship between patients'age,gender,cognition,emotions,perceptions,social/family contexts,personal gains or losses,and the occurrence of depression.[Results]The psychological state of depression patients provided feedback on their condition.Utilizing psychological features extracted from consultation records effectively detected severe depression,with an accuracy of 86％.The SHAP reveals multiple effects of patients'psychological features on depression.[Limitations]Limited by the corpus,predictions of depression severity were based only on single consultation records.Additionally,the model features were based on psychological dictionaries,while more elements related to the risk of depression could be included in the future.[Conclusions]Factors influencing the occurrence and development of depression are complex.Individual differences result in different effects of various characteristics on disease prediction.Building an automated diagnostic model for depression should focus on the model's accuracy and enhance understanding of the model's predictions.

外文关键词：

Depression PredictionOnline User-Generated ContentInterpretable Machine LearningLight Gradient Boosting Machine

作者：

聂卉、吴晓燕

展开 >

作者单位：

中山大学信息管理学院广州 510275

关键词：

抑郁症预测在线用户生成内容可解释机器学习梯度提升树算法

基金：

广州社会科学基金(2022)

项目编号：

10000-42220402

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0052

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(3)

参考文献量32