首页|基于词-主题-文本异质网络的短文本分类方法

基于词-主题-文本异质网络的短文本分类方法

扫码查看
针对现有分类方法未考虑长距离词的语义相关性和文本间潜在主题共享的问题,提出一种基于词-主题-文本异质网络(WTDHN)的短文本分类方法。通过Word2vec训练词的上下文语义向量;构建词相关性矩阵以充足的词共现信息增强短文本各级别语义学;构建以词、主题和文本为节点的异质网络,并采用图卷积学习节点之间的高阶邻域信息,丰富短文本语义。相较于基准分类模型,该方法在五个公开短文本数据集上的分类准确率平均提高1。56%。
SHORT TEXT CLASSIFICATION METHOD BASED ON WORD-TOPIC-DOCUMENT HETEROGENEOUS NETWORK
The existing short text classification methods ignore the semantic relevance between long-distance words and potential topic sharing between documents.To solve this issue,a novel short text classification method based on word-topic-document heterogeneous network(WTDHN)is proposed.The proposed method yielded the contextual semantic vectors of words through Word2vec.A word correlation matrix was constructed to enhance the learning of the potential topic distribution and the word distribution by sufficient word co-occurrence information.A heterogeneous network was constructed,with word,topic and document nodes included.The high-order neighborhood information between word,topic and document nodes was learned through the graph convolution operation,improving the semantic expression of short texts.The results on five public short text datasets show that the proposed method improves classification accuracy by 1.56%on average than the benchmark models.

Word-topic-document heterogeneous networkWord co-occurrenceDocument-topic distributionShort text classification

徐涛、赵星甲、卢敏

展开 >

中国民航大学计算机科学与技术学院 天津 300300

中国民航信息技术科研基地 天津 300300

航空公司人工智能民航局重点实验室 广东广州 510000

词-主题-文本异质网络 词共现 文本-主题分布 短文本分类

天津市自然科学基金项目中央高校基本科研业务费专项资金项目航空公司人工智能民航局重点实验室项目

18JCYBJC851003122014D032

2024

计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
年,卷(期):2024.41(1)
  • 22