In view of the fact that the Transformer model cannot learn the correspondence between Mongolian words with control symbols and the speech in the Mongolian speech recognition task, which causes the model to not adapt to the Mongolian language. A Mongolian word encoding method for Transformer model is proposed. The method uses Mongolian letter features and word features for mixed encoding. By combining Mongolian letter information, the Transformer model can distinguish Mongolian words with control symbols, and learn Mongolian words and pronunciation. Correspondence between. On the IMUT-MC dataset, the Transformer model is con-structed and the ablation and comparison experiments of word feature encoding methods are carried out. The results of ablation exper-iments show that the word feature encoding method reduces HWER, WER, and SER by 23.4%, 6.9%, and 2.6%, respectively; the comparative experimental results show that the word feature encoding method is ahead of all methods, with HWER and WER reaching 11.8%, 19.8%.