南京师大学报(自然科学版)2024,Vol.47Issue(1) :82-90.DOI:10.3969/j.issn.1001-4616.2024.01.010

基于泛化图卷积神经网络的深度文档聚类模型

Deep Document Clustering Model Based on Generalization Graph Convolutional Neural Network

柴变芳 李政 赵晓鹏 王荣娟
南京师大学报(自然科学版)2024,Vol.47Issue(1) :82-90.DOI:10.3969/j.issn.1001-4616.2024.01.010

基于泛化图卷积神经网络的深度文档聚类模型

Deep Document Clustering Model Based on Generalization Graph Convolutional Neural Network

柴变芳 1李政 1赵晓鹏 2王荣娟3
扫码查看

作者信息

  • 1. 河北地质大学信息工程学院,河北 石家庄 050031
  • 2. 河北省财政厅一体化系统运维中心,河北 石家庄 050091
  • 3. 河北地质职工大学,河北 石家庄 050086
  • 折叠

摘要

文本分类是自然语言处理中一项重要任务,基于图神经网络的文本分类因其可建模文本间的多种交互成为一种主流方法.但现有方法大都依赖标签,而真实标签难以获取.提出一个基于图泛化卷积神经网络的深度文档聚类模型(generalization graph convolutional neural network-deep document clustering,GGCN-DDC),同时实现文本表示学习和无监督文档分类.该模型首先将每个文档建模为文本图;然后采用泛化卷积层学习更有区分力的文档词特征表示和文档表示;最后通过文档聚类损失和文档图重建损失约束参数学习算法.在 3 个基准数据集上的实验表明,GGCN-DDC在多个指标上均优于其他基准算法.

Abstract

Text classification is an important task in natural language processing.The method of text classification on graph neural network has become a mainstream method since it can model the interactions among texts.However,most of the existing graph-based classification methods rely on real labels,which are difficult to captain.A deep document clustering model based on graph generalization convolutional neural network(GGCN-DDC)is proposed,which can realize unsupervised text classification while learning text representation.Firstly,the documents are modeled as a text graph.Then generalized convolution layer is used to learn the more distinguishable feature representations of words and the document representations.Finally,The learning algorithm of parameters is constrained by document clustering and reconstructing document graph.Experiments on three benchmark datasets show that GGCN-DDC outperforms other benchmark algorithms on several measures.

关键词

图神经网络/深度图聚类/文本分类/文本表示

Key words

graph neural network/deep graph clustering/text classification/text representation

引用本文复制引用

基金项目

河北省高等学校科学技术研究项目(ZD2020175)

河北地质大学 2023 国家预研项目(KY202310)

出版年

2024
南京师大学报(自然科学版)
南京师范大学

南京师大学报(自然科学版)

CSTPCD北大核心
影响因子:0.427
ISSN:1001-4616
参考文献量22
段落导航相关论文