Tibetan Pre-training Model Based on Attention Heads and Part-of-Speech Fusion
In order to acquire superior Tibetan characteristics and enhance the model's understanding of Tibetan features,part-of-speech was combined with the Tibetan pre-trained language model.Meanwhile,improving the performance of downstream tasks,the optimal attention mechanism head number of Tibetan pre-trained language model were explored by comparative experiments.The results show that pre-trained language models with 12 attention heads perform well in multiple classification tasks.Furthermore,after incorporating part-of-speech into the pre-trained language models,the macroF1 values of text,title and sentiment classification tasks increase by 0.57%,0.92%and 1.01%respectively.It is conclued that after incorporating part-of-speech features,the language structure and grammar rules of Tibetan can be better understanded.
attention mechanismpart-of-speechpre-train language modelstext classificationsentiment classification