Non-Autoregressive Sign Language Translation Technology Based on Transformer and Multimodal Alignment
To address the challenge of aligning multimodal data and improving the slow translation speed in sign language translation,a Transformer Sign Language Translation Non-Autoregression(Trans-SLT-NA)is proposed in this paper,which utilizes a self-attention mechanism.Additionally,it incorporates a contrastive learning loss function to align the multimodal data.By capturing the contextual and interaction information between the input sequence(sign language videos)and the target sequence(text),the proposed model is able to perform sign language translation to natural language in s single step.The effectiveness of the proposed model is evaluated on publicly available datasets,including PHOENIX-2014-T(German),CSL(Chinese)and How2Sign(English).Results demonstrate that the proposed method achieves a significant improvement in translation speed,with a speed boost ranging from 11.6 to 17.6 times compared to autoregressive models,while maintaining comparable performance in terms of BiLingual Evaluation Understudy(BLEU-4)and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)metrics.
Sign language translationSelf-attention mechanismNon-autoregressive translationDeep learningAlignment of multimodal data