Research on Tibetan Short Text Classification Based on GraphSAGE Network
Test classification is an important research direction in the field of natural language processing.The Tibet-an text categorization is challenged by data scarcity,complexity of extracted linguistic features,and diversity of chapter structures.In this paper,we use graph neural model as the framework.Firstly,on the basis of the"syllable-syllable"and"syllable-document",we combine the document features to dynamically construct"document-docu-ment"edge,mining the global features of short text.We also increase the sliding window to find the optimal win-dow value.Secondly,aimed at the syllable sparsity of Tibetan short text,GraphSAGE is introduced as the base model to explore the performance difference in different aggregation functions.Finally,to capture the heterogeneity of relationships between nodes,a feature-weighting approach is proposed based on average pooling.Experiments on the TNCC title dataset show our model has reached 62.50%accuracy,outperforming the GGN,the original Graph-SAGE and the pre-trained language model CINO by 2.56%,1%and 2.4%,respectively.
graph neural networkTibetan text classificationTNCC dataset