复旦学报(自然科学版)2024,Vol.63Issue(3) :344-350.

融合多种语音特征参数的阈下抑郁风险预测

Subthreshold Depression Risk Prediction by Fusing Different Speech Feature Parameters

何婉婷 林琴韵 杨旭东 严洪立 徐攀 杨朝阳 高跃明
复旦学报(自然科学版)2024,Vol.63Issue(3) :344-350.

融合多种语音特征参数的阈下抑郁风险预测

Subthreshold Depression Risk Prediction by Fusing Different Speech Feature Parameters

何婉婷 1林琴韵 2杨旭东 1严洪立 3徐攀 3杨朝阳 2高跃明4
扫码查看

作者信息

  • 1. 福州大学先进制造学院,福建泉州 362251;福州大学健康信息智能感知国际联合实验室,福建福州 350108
  • 2. 福建中医药大学中医学院,福建福州 350122
  • 3. 福州大学健康信息智能感知国际联合实验室,福建福州 350108;福州大学物理与信息工程学院,福建福州 350108
  • 4. 福州大学先进制造学院,福建泉州 362251;福州大学健康信息智能感知国际联合实验室,福建福州 350108;福州大学物理与信息工程学院,福建福州 350108
  • 折叠

摘要

声纹识别为阈下抑郁辨识诊断和干预评价提供客观的参考依据.本研究采用不同言语方式(读/a:/音、文本朗读、图片描述、自由访谈)和情绪刺激(正性、中性、负性),融合韵律、音色、频谱、共振峰等4类语音特征,提取出Mel频率倒谱系数、音速、基频、共振峰等16种特征参数,利用随机森林分类算法建立阈下抑郁风险预测模型,并与其他分类器对比.结果表明:未融合特征前图片描述和自由访谈的识别率高于其他言语方式,其中正性刺激的预测结果更好,准确率达72.50%和67.39%;融合特征后读/a:/音和自由访谈分别获得了 93.00%和85.00%的高准确率.由此可知,融合特征后模型学习到的语音信息不仅仅包含被试者的情感状态,也包含特征类型间的相互关系;读/a:/音和自由访谈保留更多的声道信息,其中读/a:/音发声持久、音强持续,自由访谈语量和特征全面,接近自然言语.本文结果对阈下抑郁早期风险预测有一定的参考意义.

Abstract

Vocal pattern recognition provides an objective reference for subthreshold depression recognition diagnosis and intervention evaluation.In this study,we used different speech modalities(reading/a:/tone,text reading,picture description,free interview)and emotional stimuli(positive,neutral,negative),fused four types of speech features such as rhyme,timbre,spectrum,and resonance peak,extracted 16 feature parameters such as Mel-frequency cepstrum coefficient,speed of sound,fundamental frequency,and resonance peak,established a subthreshold depression risk prediction model using random forest,and compared the performance with other classifiers.The results showed that the recognition rate of picture description and free interview before fusing features was higher than other speech modalities,in which the prediction results of positive stimuli were better with 72.50%and 67.39%accuracy;the high accuracy rates of 93.00%and 85.00%were obtained for reading/a:/tone and free interview after fusing feature layers,respectively.It can be seen that the phonetic information learned by the model after fusing features contains not only the subject's emotional state but also the interrelationship between feature types;the reading/a:/tone and free interview retain more vocal tract information,where the reading/a:/tone vocalization is persistent and the sound intensity is sustained,and the free interview speech volume and features are comprehensive and close to natural speech,which are informative for early risk prediction of subthreshold depression.

关键词

阈下抑郁/语音特征/分类器/声纹识别

Key words

subthreshold depression/speech features/classifier/voice recognition

引用本文复制引用

基金项目

国家重点研发计划"政府间国际科技创新合作"重点专项(2022YFE0115500)

出版年

2024
复旦学报(自然科学版)
复旦大学

复旦学报(自然科学版)

CSTPCD北大核心
影响因子:0.388
ISSN:0427-7104
参考文献量5
段落导航相关论文