融合多图卷积与层级池化的文本分类模型

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：文本分类是自然语言处理中的一个重要问题,其目的是将标签分配给输入的文档.在文本分类任务中,单词间的共现关系提供了文本特性及词汇分布的重要视角,而词嵌入信息能提供丰富的语义信息,并对全局词汇交互和潜在语义关系造成影响.然而,过去的研究未能有效整合这两方面,或过度关注其中一方面.在这样的背景下,文中提出了一种新的方法,用于自适应地融合这两类信息,在考虑结构关系和嵌入信息的同时,找到一个合理的平衡以提高模型效果.该模型首先从词汇共现模式和语义嵌入信息的角度将文本数据构建成文本共现图和文本嵌入图,利用图卷积来增强节点嵌入,图池化层融合节点嵌入并识别保留重要性更高的节点,遵循分层池化模式并按层学习文档级表示,并引入门控融合模块对两个图的嵌入进行自适应的融合.在5个公开的文本分类数据集上进行了大量实验,结果表明了HTGNN在文本分类任务上的优异性能.

外文标题：Text Classification Method Based on Multi Graph Convolution and Hierarchical Pooling

外文摘要：Text classification,as a critical task in natural language processing,aims to assign labels to input documents.The Co-occurrence relationship between words offers key perspectives on text characteristics and vocabulary distribution,while word em-beddings supply rich semantic information,influencing global vocabulary interaction and potential semantic relationships.Previous research has struggled to adequately incorporate both aspects or has disproportionately emphasized one over the other.To address this issue,a novel method is proposed in this paper that adaptively fuses these two types of information,aiming to strike a balance that can improve model performance while considering both structural relationships and embedded information.The method be-gins by constructing text data into text co-occurrence graphs and text embedding graphs,reflecting the context structure and se-mantic embedding information respectively.Graph convolution is then utilized to enhance node embeddings.In the graph pooling layer,node embeddings are fused and nodes of higher importance are identified by employing a hierarchical pooling model,learning document level representations layer by layer.Furthermore,we introduce a gated fusion module to adaptively fuse the embeddings of the two graphs.The proposed approach is validated with extensive experiments on five publicly available text classification datasets,and the experimental results show the superior performance of the HTGNN model in text classification tasks.

外文关键词：

Text classificationGraph neural networkGraph representation learningGraph classificationAttention mechanism

作者：

魏子昂、彭舰、黄飞虎、琚生根

展开 >

作者单位：

四川大学计算机学院成都 610041

关键词：

文本分类图神经网络图表示学习图分类注意力机制

基金：

四川省重点研发计划四川省重点研发计划四川大学宜宾市合作项目

项目编号：

2022YFG00342023YFG01152020CDYB-30

出版年：

2024

DOI：

10.11896/jsjkx.230400164

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(7)

参考文献量1