为提高藏文新闻文本分类准确性,文章提出一种融合少数民族语言预训练模型(Chinese Minority Pr-etrained Language Model,CINO)和图卷积神经网络模型(Text Graph Convolutional Networks,TextGCN)的方法,即CINO-TextGCN模型.为有效评测该模型对藏文文本的分类性能,自建了较大规模和较高质量的藏文新闻文本公开数据集TNEWS(https://github.com/LG2016/CINO-TextGCN),通过实验发现,CINO-Text-GCN在公开数据集TNCC上的准确率为74.20%,在TNEWS上为83.96%.因此,该融合模型能够较好地捕捉到藏文文本语义,提升藏文文本分类性能.
CINO-TextGCN:A Model of Tibetan News Text Classification Combined With CINO and TextGCN
To improve the accuracy of traditional Tibetan news text classification,a method that integrates the Chinese Minority Pretrained Language Model(CINO)and Text Graph Convolutional Networks(TextGCN),name-ly the CINO-TextGCN model which combined the advantages of inductive learning and transduction learning,is proposed in this paper.To solve the problem of a lack of public and unified Tibetan corpus,a large-scale and high-quality Tibetan news text public dataset TNEWS(https://github.com/LG2016/CINO-TextGCN)is construct-ed.Our experimental result showed that the accuracy of CINO-TextGCN on the public dataset TNCC and TNEWS was 74.20%and 83.96%,respectively,implying the CINO-TextGCN model can better capture the se-mantics of Tibetan texts and thus improve the classification performance of Tibetan texts.