基于泛化图卷积神经网络的深度文档聚类模型

扫码查看

原文链接

万方数据
维普

中文摘要：文本分类是自然语言处理中一项重要任务，基于图神经网络的文本分类因其可建模文本间的多种交互成为一种主流方法。但现有方法大都依赖标签，而真实标签难以获取。提出一个基于图泛化卷积神经网络的深度文档聚类模型(generalization graph convolutional neural network-deep document clustering，GGCN-DDC)，同时实现文本表示学习和无监督文档分类。该模型首先将每个文档建模为文本图;然后采用泛化卷积层学习更有区分力的文档词特征表示和文档表示;最后通过文档聚类损失和文档图重建损失约束参数学习算法。在 3 个基准数据集上的实验表明，GGCN-DDC在多个指标上均优于其他基准算法。

外文标题：Deep Document Clustering Model Based on Generalization Graph Convolutional Neural Network

外文摘要：Text classification is an important task in natural language processing.The method of text classification on graph neural network has become a mainstream method since it can model the interactions among texts.However,most of the existing graph-based classification methods rely on real labels,which are difficult to captain.A deep document clustering model based on graph generalization convolutional neural network(GGCN-DDC)is proposed,which can realize unsupervised text classification while learning text representation.Firstly,the documents are modeled as a text graph.Then generalized convolution layer is used to learn the more distinguishable feature representations of words and the document representations.Finally,The learning algorithm of parameters is constrained by document clustering and reconstructing document graph.Experiments on three benchmark datasets show that GGCN-DDC outperforms other benchmark algorithms on several measures.

外文关键词：

graph neural networkdeep graph clusteringtext classificationtext representation

作者：

柴变芳、李政、赵晓鹏、王荣娟

展开 >

作者单位：

河北地质大学信息工程学院,河北石家庄 050031

河北省财政厅一体化系统运维中心,河北石家庄 050091

河北地质职工大学,河北石家庄 050086

关键词：

图神经网络深度图聚类文本分类文本表示

基金：

河北省高等学校科学技术研究项目河北地质大学 2023 国家预研项目

项目编号：

ZD2020175KY202310

出版年：

2024

DOI：

10.3969/j.issn.1001-4616.2024.01.010

南京师大学报(自然科学版)

南京师范大学

南京师大学报(自然科学版)

CSTPCD北大核心

影响因子：0.427

ISSN：1001-4616

年,卷(期)：2024.47(1)

参考文献量22