计算机科学2024,Vol.51Issue(11) :157-165.DOI:10.11896/jsjkx.231000209

基于特征插值的深度图对比聚类算法

Feature Interpolation Based Deep Graph Contrastive Clustering Algorithm

杨希洪 郑群 章佳欣 王沛 祝恩
计算机科学2024,Vol.51Issue(11) :157-165.DOI:10.11896/jsjkx.231000209

基于特征插值的深度图对比聚类算法

Feature Interpolation Based Deep Graph Contrastive Clustering Algorithm

杨希洪 1郑群 2章佳欣 1王沛 1祝恩1
扫码查看

作者信息

  • 1. 国防科技大学计算机学院 长沙 410073
  • 2. 中国科学技术大学地球和空间科学学院 合肥 230001
  • 折叠

摘要

Mixup是图像领域中一种有效的数据增强方法,它通过对输入图像以及标签进行插值来合成新的样本进而扩大训练分布.然而,在图节点聚类任务中,由于图数据拓扑结构的不规则性和连通性以及无监督的场景,设计有效的插值方法成为一项具有挑战性的任务.为了解决上述问题,首先通过设计不共享参数的编码器来获取视图的嵌入特征,有效融合节点的特征和结构信息.然后将视图的嵌入特征及其对应的伪标签进行混合插值,从而将Mixup引入聚类任务中.为了确保伪标签的可靠性,设置了阈值来筛选高置信度的伪标签,并通过EMA的方式更新模型参数,使模型平稳优化的同时考虑了训练的历史信息.此外,设计了一个图对比学习模块,以保证特征在不同视图下的一致性,从而减少信息冗余,提高模型的判别能力.最终,通过在6个数据集上的大量实验证明了所提方法的有效性.

Abstract

Mixup is an effective data augmentation technique in the field of computer vision.It is widely used for expanding the training distribution by interpolating input images and labels to generate new samples.However,in the context of graph node clustering tasks,designing robust interpolation methods poses challenges due to the irregularity and connectivity of graph data,as well as the unsupervised nature of the problem.To address these challenges,we propose a novel approach that leverages a dedica-ted encoder with non-shared parameters to extract embedding features from different views of graph.This allows us to effectively integrate both the node features and structural information.We then introduce Mixup into the clustering task by performing mixed interpolation on the embedding features along with their corresponding pseudo-labels.To ensure the reliability of these pseudo-labels,we apply a threshold to filter out high-confidence predictions,while incorporating an exponential moving average(EMA)mechanism for updating model parameters and considering the historical information during training.Furthermore,we in-corporate a graph contrastive learning module to enhance feature consistency across different views,reducing information redun-dancy and improving the discriminative power of the model.Extensive experiments on six datasets demonstrate the effectiveness of the proposed method.

关键词

数据增强/图对比聚类/EMA/Mixup/图神经网络

Key words

Data augmentation/Graph contrastive clustering/EMA/Mixup/Graph neural network

引用本文复制引用

基金项目

国家科技重大专项(2022ZD0209103)

出版年

2024
计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCDCSCD北大核心
影响因子:0.944
ISSN:1002-137X
参考文献量30
段落导航相关论文