A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM
An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan di-alect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.
lip-languageMinnan languageTransformerlong short term memory(LSTM)using spatio-temporal convolutional neural networksattention mechanismend-to-end model