SHORT TEXT CLASSIFICATION METHOD BASED ON WORD-TOPIC-DOCUMENT HETEROGENEOUS NETWORK
The existing short text classification methods ignore the semantic relevance between long-distance words and potential topic sharing between documents.To solve this issue,a novel short text classification method based on word-topic-document heterogeneous network(WTDHN)is proposed.The proposed method yielded the contextual semantic vectors of words through Word2vec.A word correlation matrix was constructed to enhance the learning of the potential topic distribution and the word distribution by sufficient word co-occurrence information.A heterogeneous network was constructed,with word,topic and document nodes included.The high-order neighborhood information between word,topic and document nodes was learned through the graph convolution operation,improving the semantic expression of short texts.The results on five public short text datasets show that the proposed method improves classification accuracy by 1.56%on average than the benchmark models.
Word-topic-document heterogeneous networkWord co-occurrenceDocument-topic distributionShort text classification