Pinyin Uighur Language Recognition Method Based on TI-FastText
Uyghur is one of the most important languages in Xinjiang Uyghur Autonomous Region in Chi-na,and Pinyin Uyghur is born due to the difficulties in computer processing and information retrieval.Pi-nyin Uyghur provides convenience for the digitization of Uyghur.However,due to the characteristics of Pinyin Uyghur,such as lack of completely unified standards,preference for colloquialism,online social media and difficulty in data collection,it is difficult for computers to recognize Pinyin Uyghur.Based on this,the fusion method of TF-IDF and FastText is firstly introduced to identify Pinyin Uyghur.Compared with the traditional method,the innovation of this method is that TF-IDF can extract the unique linguistic characteristics of Pinyin Uighur language in depth;the fusion FastText model can reduce the limitations of a single model;this method can realize more accurate Uighur recognition by using its high sensitivity to-wards word order and low-frequency vocabulary.Meanwhile,to reduce the robustness of the model,data forgery technology is introduced to obtain many multi-lingual datasets.The experimental results show that the accuracy of the technology to identify pinyin Uyghur can reach more than 95%.The development of pinyin Uyghur language recognition technology can help better process and manage Uyghur information in the digital era,and promote the research and application of natural language processing and artificial in-telligence in Uyghur recognition.