Study on Tibetan Short Text Classification Based on DAN and FastText
As Tibetan information continues to be integrated into social life,more and more Tibetan short text data is available on online platforms.Aiming at the low classification performance of traditional classification methods on Tibetan short texts,a Ti-betan short text classification model based on DAN-FastText is proposed.The model uses the FastText network to perform un-supervised training on a large-scale Tibetan corpus to obtain the pre-trained Tibetan syllabic vector set,uses the pre-trained sylla-ble vector set to convert the Tibetan short text information into syllable vector,sends the syllable vector into the deep averaging networks(DAN)network and fuses the sentence vector features trained by the FastText network in the output stage,and finally completes the classification through the fully connected layer and the softmax layer.On the publicly available tibetan news classi-fication corpus(TNCC)news headline dataset,Macro-F1 is 64.53%,which is 2.81%higher than that of the TiBERT model and 6.14%higher than that GCN model,and the fusion model has a better Tibetan short text classification effect.
Tibetan short text classificationFeature fusionDeep averaging networksFast text