Facial Landmark Detection Based on Hierarchical Self-Attention Network
Facial landmark detection,a key step in facial image processing,is commonly performed using the coordinate regression method based on deep neural networks,which has the advantage of fast processing speed.However,the high-level network features used for regression lose spatial structural information and lack fine-grained representation ability,leading to a decrease in detection accuracy.Therefore,a facial landmark detection algorithm based on multi-level self-attention network is proposed to address this issue.To extract image semantic features with finer granularity representation ability,a multi-level feature fusion module based on self-attention mechanism is constructed to achieve cross-level fusion of high-level semantic and low-level spatial information features.On this basis,a training method for multi-task learning of facial landmark detection and localization,as well as facial pose angle estimation,is designed to optimize the network estimation of the overall orientation and pose of the face,thereby improving the accuracy of landmark detection.The experimental results on mainstream facial landmark datasets 300W and WFLW show that compared with methods such as SAAT and AnchorFace,the proposed method effectively improves network detection accuracy,achieving a standard average error of 3.23% and 4.55%,respectively,which are 0.37 and 0.59 percentage points lower than the baseline model.The error rate indicator on the WFLW dataset is 3.56%,which is 2.86 percentage points lower than the baseline model,demonstrating that the proposed method can extract more robust and fine-grained expression features.