首页|微运动激励与时间感知的唇语识别方法

微运动激励与时间感知的唇语识别方法

扫码查看
时序信息和唇部细微变化对唇语识别至关重要.然而,现有唇语识别方法不能精准捕获时序信息和关注细微运动.为此,提出一种关注微小唇部变化和增强时序信息的唇语识别方法DMT-GhostNet.首先,引入解藕时空增强块(Decoupled Spatio-Temporal Enhancement Block,DSTE),将单一3D卷积解藕为时间域和空间域;其次,基于运动激励(Motion Excitation,ME)与Ghost瓶颈块提出微运动瓶颈块(Micro-Motion Bottleneck,M-Ghost),捕捉唇部的微小运动;最后,提出时间感知模块(Transformer Multi-Scale Temporal Convolution Network,TransMS-TCN),聚焦重要时间序列,限制无关信息流入MS-TCN.实验结果表明,DMT-GhostNet在LRW数据集上取得了89.21%的准确率,比基于ResNet的主流方法提升3.91%,降低参数量近6 M,能够更好地利用时序信息并聚焦唇部细节,显著提高唇语识别性能.
Micro-Motion Excitation and Time Perception for Lip Reading
Temporal information and subtle lip changes are crucial for lip reading.However,existing lip-reading methods have not accurately captured temporal information and focus on subtle movements.In response,we propose a lip-reading method named DMT-GhostNet that emphasizes minor lip variations and enhances temporal information.We intro-duce the decoupled spatio-temporal enhancement block(DSTE)to decouple the single 3D convolution into the time domain and the spatial domain.Based on motion excitation(ME)and the Ghost bottleneck block,we introduce the micro-motion bottleneck(M-Ghost)to detect subtle lip motions.The transformer multi-scale temporal convolution network(TransMS-TCN)is proposed to focus on important temporal sequences and restrict irrelevant information from flowing into MS-TCN.Experimental results show that DMT-GhostNet achieved an accuracy of 89.21%on the LRW dataset,which is an increase of 3.91%over mainstream methods based on ResNet and reduces the parameter count by nearly 6 M.This indicates that DMT-GhostNet effectively utilizes temporal information and focuses on lip details,significantly improving lip-reading per-formance.

lip-readingGhostNetV2time dimensionmicro-motion excitation

马金林、吕鑫、马自萍、郭兆伟、吕科

展开 >

北方民族大学计算机科学与工程学院,宁夏 银川 750021

北方民族大学数学与信息科学学院,宁夏 银川 750021

中国科学院大学计算机与通信工程学院,北京 100049

唇语识别 GhostNetV2 时间维度 微运动激励

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(11)