Parkinson's Disease Detection Model Based on Hierarchical Fusion of Multi-type Speech Information
Speech data for Parkinson's disease detection typically includes sustained vowels,repeated syllables and contextual dialogues.Most of the existing models adopt a single type of speech data as input,making them susceptible to noise interference and a lack of robustness.The current challenge of Parkinson's disease detection is effectively integrating different types of speech data and extracting critical pathological information.In this paper,a Parkinson's disease detection method based on hierarchical fusion of multi-type speech information is proposed,aiming to extract rich and comprehensive pathological information and achieve better detection performance.Firstly,various acoustic features are extracted for different types of Parkinson's disease speech data.Then,a representation learning scheme is designed to mine deep information from multiple types of acoustic features.The underlying pathological information in acoustic features is reflected more accurately by extracting articulation and rhythm information.Furthermore,a decoupled representation learning space is designed for two mentioned types of information above to extract their respective private features,while learning their shared representation simultaneously.Finally,a cross-type attention hierarchical fusion module is designed to progressively fuse shared and private representations using cross-attention mechanisms at different granularities,aiming to enhance Parkinson's disease detection performance.Experiments on publicly available Italian Parkinson's disease speech dataset and a self-collected Chinese Parkinson's disease speech dataset demonstrate the accuracy improvement of the proposed approach.