首页|基于多模态唇部状态识别的语音导航抗干扰系统

基于多模态唇部状态识别的语音导航抗干扰系统

扫码查看
针对现有车载语音导航设备易受到车内外噪声干扰、无法准确判定声音信号来源的问题,提出了一种基于唇部状态识别的语音导航抗干扰系统。通过摄像头实时识别驾驶员唇部状态,准确判定驾驶员声音信号的起止时间端点,进而控制语音导航输入信号开启和关闭,增强驾驶员对语音导航的控制权限,减少车内外的噪声干扰。为保证唇部状态识别的准确性和鲁棒性,提出了一种基于关键点-外观短时特征融合的多模态唇部状态识别网络,进行了关键点短时特征有效性试验、多模态特征融合的唇部状态识别消融试验、实验室模拟环境和真实车载环境下的语音导航抗干扰试验。结果表明,文中提出的关键点短时特征算子可增强唇部状态变化表征能力14%以上,关键点-外观特征融合的唇部状态识别网络通过特征互补提升识别准确性8。98%以上。基于该网络的语音导航抗干扰系统准确性高(92。6%)、实时性好(检测速度为35帧/s);在驾驶员左、右侧面超过70°的大幅度头部姿态变化下,能有效减少车内外噪声对导航语音控制的干扰,表现出较高的鲁棒性。
Voice navigation anti-interference system based on multimodal lip state recognition
To solve the problem that the existing in-vehicle voice navigation devices were susceptible to interference from the noise both inside and outside vehicle and could not accurately determine the source of sound signals,the voice navigation anti-interference system based on lip state recognition was proposed.Using a camera to perform real-time recognition of the driver lip state,the start and end points of the driver voice signal were accurately determined,and the activation and deactivation of the voice navigation input signal were controlled for enhancing the driver control over the voice navigation and reducing the interference from the noise inside and outside vehicle.To accurately assess the accuracy and robustness of lip state recognition,the multimodal lip state recognition network based on key point-appearance short-term feature fusion was proposed.The experiment of validating the effectiveness of key point short-term features,the ablation experiment of multimodal feature fusion in lip state recognition and the voice navigation anti-interference tests in both simulated laboratory environments and real in-vehicle environments were conducted.The results show that the proposed key point short-term feature operator can enhance the representation ability of lip state changes by more than 14%.The key point-appearance fusion lip state recognition network improves the recognition accuracy by 8.98%through feature complementation.The voice navigation anti-interference system based on this network exhibits high accuracy of 92.6%and good real-time performance with detection speed of 35 F/s.The interference from the noise inside and outside vehicle on the driver voice control authority can be effectively reduced even under the significant head pose changes of more than 70 degrees to the left or right,which demonstrates high robustness.

voice navigation anti-interference systemlip state recognitionkey pointsappearance featuresfeature fusionlong short-term memory network

王晗、陈怡霖、季钰姣、杜若琳

展开 >

南通大学信息科学技术学院,江苏南通 226019

南通大学交通与土木工程学院,江苏南通 226019

语音导航抗干扰系统 唇部状态识别 关键点 外观特征 特征融合 长短期记忆网络

2025

江苏大学学报(自然科学版)
江苏大学

江苏大学学报(自然科学版)

北大核心
影响因子:0.801
ISSN:1671-7775
年,卷(期):2025.46(1)