路径掩码自编码器引导无监督属性图节点聚类

Path-masked Autoencoder Guiding Unsupervised Attribute Graph Node Clustering

丁新宇 ¹孔兵 ¹陈红梅 ¹包崇明 ²周丽华¹

扫码查看

作者信息

1. 云南大学信息学院昆明 650504
2. 云南大学软件学院昆明 650504
折叠

摘要

图聚类的目的在于发现网络的社区结构.针对目前聚类方法无法很好地获取网络深层潜在社区信息,且不能对特征进行合适的信息整合导致节点社区语义不清晰的问题,提出了一种路径掩码自编码器引导无监督属性图节点聚类模型(Path-Masked Autoencoder Guiding Unsupervised Attribute Graph Node Clustering,PAUGC).该模型通过对网络进行随机路径掩码后使用自编码器来深度挖掘网络拓扑结构,从而获得良好的全局结构语义信息,利用规范性方法来对特征进行信息整合,使节点特征能够更好地表征特征的类别信息.此外,模型结合模块最大化来抓取整个图中的底层社区群落信息,目的在于更合理地将其融合到低维度节点特征中.最后通过自训练聚类来不断迭代优化更新聚类表示以获得最终的节点特征.通过在8个基准数据集上与11种经典方法进行大量实验对比,证明了 PAUGC的有效性.

Abstract

The purpose of graph clustering is to discover the community structure of the network.Aiming at the problem that the current clustering methods can not well obtain the deep potential community information of the network,and can not make sui-table information integration of the features,resulting in unclear semantics of the node community,a path-masked autoencoder guiding unsupervised attribute graph node clustering(PAUGC)model is proposed.This model utilizes an autoencoder to deeply dig the network topology structure by randomly masking network paths,thereby obtaining excellent global structural semantic in-formation.Utilizing a normative method for information integration of the features,so that the node features are able to better characterize the class information of the features.In addition,the model combines modularity maximization to capture the under-lying community clusters information in the whole graph,aiming to more reasonably fuse it into the low-dimensional node fea-tures.Finally,the model iteratively optimizes and updates the clustering representation through self-training clustering to obtain the final node features.By conducting extensive experiments and comparisons with 11 classical methods on 8 benchmark datasets,PAUGC has been proven to be effective compared to current mainstream methods.

关键词

深度图聚类/无监督学习/特征信息整合/模块最大化/聚类自训练

Key words

Deep graph clustering/Unsupervised learning/Feature integration/Module maximization/Self-training for clustering

引用本文复制引用

出版年

2025

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

北大核心

影响因子：0.944

ISSN：1002-137X

段落导航