Multi-modal depression detection method based on cross-modal feature reconstruction and decoupling network
Depression is a widespread and severe mental health disorder,and requires early detection for effective interven-tion.Automated depression detection that integrates audio and text modalities addresses the challenges posed by information re-dundancy and modality heterogeneity.Previous studies often fail to capture the interaction between audio and text modalities for effective depression detection.To overcome these limitations,this study proposed a multi-modal depression detection method based on cross-modal feature reconstruction and a decoupling network(CFRDN).The method used text as the core modality,guiding the model to reconstruct audio features for cross-modal feature decoupling tasks.The framework separated shared and private features from the text-guided reconstructed audio features for subsequent multimodal fusion.Extensive experiments on the DAIC-WoZ and E-DAIC datasets demonstrate that the proposed method outperforms state-of-the-art approaches in multimo-dal depression detection tasks.