CINO-TextGCN:融合CINO与TextGCN的藏文文本分类模型研究
CINO-TextGCN:A Model of Tibetan News Text Classification Combined With CINO and TextGCN
李果 1杨进 2陈晨1
作者信息
- 1. 西藏大学信息科学技术学院 西藏拉萨 850000;西藏大学藏文信息技术教育部工程研究中心 西藏拉萨 850000
- 2. 四川大学网络空间安全学院 四川成都 610000
- 折叠
摘要
为提高藏文新闻文本分类准确性,文章提出一种融合少数民族语言预训练模型(Chinese Minority Pr-etrained Language Model,CINO)和图卷积神经网络模型(Text Graph Convolutional Networks,TextGCN)的方法,即CINO-TextGCN模型.为有效评测该模型对藏文文本的分类性能,自建了较大规模和较高质量的藏文新闻文本公开数据集TNEWS(https://github.com/LG2016/CINO-TextGCN),通过实验发现,CINO-Text-GCN在公开数据集TNCC上的准确率为74.20%,在TNEWS上为83.96%.因此,该融合模型能够较好地捕捉到藏文文本语义,提升藏文文本分类性能.
Abstract
To improve the accuracy of traditional Tibetan news text classification,a method that integrates the Chinese Minority Pretrained Language Model(CINO)and Text Graph Convolutional Networks(TextGCN),name-ly the CINO-TextGCN model which combined the advantages of inductive learning and transduction learning,is proposed in this paper.To solve the problem of a lack of public and unified Tibetan corpus,a large-scale and high-quality Tibetan news text public dataset TNEWS(https://github.com/LG2016/CINO-TextGCN)is construct-ed.Our experimental result showed that the accuracy of CINO-TextGCN on the public dataset TNCC and TNEWS was 74.20%and 83.96%,respectively,implying the CINO-TextGCN model can better capture the se-mantics of Tibetan texts and thus improve the classification performance of Tibetan texts.
关键词
藏文/图卷积神经网络/融合模型/新闻文本/文本分类Key words
Tibetan/Graph Convolutional Networks(GCN)/integrated model/news text/text classification引用本文复制引用
基金项目
国家自然科学基金项目(62162057)
国家自然科学基金项目(61872254)
国家自然科学基金项目(61872254)
出版年
2024