Micro-Motion Excitation and Time Perception for Lip Reading
Temporal information and subtle lip changes are crucial for lip reading.However,existing lip-reading methods have not accurately captured temporal information and focus on subtle movements.In response,we propose a lip-reading method named DMT-GhostNet that emphasizes minor lip variations and enhances temporal information.We intro-duce the decoupled spatio-temporal enhancement block(DSTE)to decouple the single 3D convolution into the time domain and the spatial domain.Based on motion excitation(ME)and the Ghost bottleneck block,we introduce the micro-motion bottleneck(M-Ghost)to detect subtle lip motions.The transformer multi-scale temporal convolution network(TransMS-TCN)is proposed to focus on important temporal sequences and restrict irrelevant information from flowing into MS-TCN.Experimental results show that DMT-GhostNet achieved an accuracy of 89.21%on the LRW dataset,which is an increase of 3.91%over mainstream methods based on ResNet and reduces the parameter count by nearly 6 M.This indicates that DMT-GhostNet effectively utilizes temporal information and focuses on lip details,significantly improving lip-reading per-formance.