首页|基于支持向量机的复杂场景中多人对话语音智能识别方法研究

基于支持向量机的复杂场景中多人对话语音智能识别方法研究

扫码查看
面对多人对话语音单一特征表征性别组合信息不足,导致语音识别结果不精准的问题,提出了基于支持向量机的复杂场景中多人对话语音智能识别方法.使用距离度量方法,检测复杂场景多人对话变化点.计算任意两个数据集的对数似然概率值,构建得分集.结合T-Test相似性度量方法,判断两个数据集显著差异性.构造支持向量机判别函数,利用支持向量机的映射逻辑实现相似话音的分离.使用支持向量机的二元分类超线性分类器构建最优判别函数,结合男性、女性基音频率、信号非谐振频率特征,实现多人对话语音智能识别.由实验结果可知,所研究方法对于基音频率识别结果,男性、女性幅度波动范围分别为-0.5~0.5、0.7~0.7,与实验数据一致;对于信号非谐振频率识别结果,男性、女性频率波动范围分别为-600~600 Hz、360~405 Hz,男性频率波动范围与实验数据仅存在50 Hz的误差,女性频率波动范围与实验数据一致.
Research on Intelligent Speech Recognition Method for Multi Person Conversation in Complex Scenes Based on Support Vector Machine
In the face of the problem of insufficient gender combination information represented by a single feature in multi-per-son conversation speech,resulting in inaccurate speech recognition results,a support vector machine based intelligent recognition method for multi-person conversation speech in complex scenes is proposed.Using distance measurement methods to detect changes in multi-person conversations in complex scenes.Calculate the logarithmic likelihood probability values of any two datasets and con-struct a diversity set.Using the T-Test similarity measurement method,determine the significant differences between the two data-sets.Construct a support vector machine discriminant function and use the mapping logic of the support vector machine to achieve the separation of similar voices.The binary classification super linear classifier of support vector machine is used to construct the op-timal discriminant function,and combined with male and female pitch frequency and signal non resonant frequency characteristics,the intelligent recognition of multi person conversation speech is realized.From the experimental results,it can be seen that the range of amplitude fluctuations for pitch frequency recognition in the research method is-0.5~0.5 for males and-0.7~0.7 for females,which is consistent with the experimental data;For the non resonant frequency identification results of the signal,the fre-quency fluctuation ranges for males and females are-600~600 Hz and-360~405 Hz,respectively.There is only a 50 Hz error between the male frequency fluctuation range and the experimental data,while the female frequency fluctuation range is consistent with the experimental data.

support vector machinecomplex scenariosmulti person dialoguespeech intelligent recognition

刘子寒、沈力、奚梦婷、陆佳鑫、朱佳佳、查俊杰

展开 >

国网江苏省电力有限公司信息通信分公司,江苏南京 210000

支持向量机 复杂场景 多人对话 语音智能识别

2024

计算技术与自动化
湖南大学

计算技术与自动化

CSTPCD
影响因子:0.295
ISSN:1003-6199
年,卷(期):2024.43(4)