Recently,in the field of natural language processing,pre training plus fine tuning has become a new para-digm.This paper collects and arranges the Tibetan text corpus containing 4.655 billion characters,then pretrained by a Tibetan language model via the UniLM model,enhanced by the Tibetan text features.Experiments show that this method has achieved remarkable results in four downstream tasks,such as Tibetan La case sentence classifica-tion and Tibetan text classification.
关键词
藏文预训练语言模型/文本数据增强方法/UniLM模型
Key words
Tibetan pre-training language model/text data enhancement method/UniLM model