图自编码器模型作为网络表示学习的代表性方法,在链路预测和节点分类任务方面性能表现优异.然而,图自编码器模型在处理社区发现任务时通常只考虑局部节点连边的重建而忽略了社区全局结构的影响,尤其是在多个社区存在重叠节点的情况下,难以准确判断节点归属关系和社区分布.针对此问题,该文提出了一种面向重叠社区发现的无监督模块度感知图自编码器模型(modularity-aware graph autoencoder model for overlapping community detection,GAME),GAME 采用一种高效的模块度损失函数,该函数在网络嵌入过程中保留社区关系的同时,能重构损失并更新编码器的参数,以提高模型针对重叠社区发现任务的性能,进而将GAME得到的社区隶属度矩阵以概率-节点形式进行社区分配.该文提出的GAME在10个公开数据集上进行实验验证,并与主流的基于表示学习的重叠社区发现模型进行对比.实验结果表明:在归一化互信息(normalized mutual information,NMI)评估指标下,GAME模型性能优于主流模型,证明该模型有效.
Overlapping community detection model based on a modularity-aware graph autoencoder
[Objective]In the ever-expanding field of network science,the abstraction of complex entity relationships into network structures provides a foundation for understanding real-world interactions.The discovery of communities within these networks plays a pivotal role in identifying clusters of closely interconnected nodes.This process reveals latent patterns and functionalities inherent in the intricate fabric of reality,proving invaluable for tracking dynamic network behaviors and assessing community influences.These influences span a range of phenomena,from rumor propagation to virus outbreaks and tumor evolution.A notable characteristic of these communities is their overlapping nature,with participants often straddling multiple community boundaries.This characteristic adds an additional layer of complexity to the exploration of network structures,making the discovery of overlapping communities imperative for a comprehensive understanding of network structures and functional dynamics.[Methods]Within the realm of network science,network representation learning algorithms have significantly enriched the pursuit of community discovery.These algorithms adeptly transform complex network information into lower-dimensional vectors,effectively maintaining the underlying network structure and attribute information.Such representations prove invaluable for subsequent graph processing tasks,including but not limited to link prediction,node classification,and community discovery.Among these algorithms,the graph autoencoder model is a prominent representative,demonstrating efficiency in learning network embeddings and finding applications in diverse community discovery tasks.However,a limitation inherent in traditional graph autoencoder models is their predominant focus on local node-edge reconstruction.This focus often overlooks the crucial influence of community structure,particularly in scenarios featuring overlapping nodes across multiple communities.This inherent challenge makes it difficult to precisely determine node affiliations and community distributions.To address this issue,we introduce an innovative unsupervised modularity-aware graph autoencoder model(GAME)designed for overlapping community discovery.The model incorporates an efficient modularity maximization loss function into the graph autoencoder framework.This ensures the preservation of community structure throughout the network embedding process.The modularity-aware loss is meticulously reconstructed to facilitate the update of encoder parameters,thereby improving the model performance in overlapping community discovery tasks.We harness the resulting community membership matrix to probabilistically assign communities to nodes.[Results]The efficacy of the proposed GAME model was rigorously evaluated across six diverse social network datasets(Facebook 348,Facebook 414,Facebook 686,Facebook 698,Facebook 1684,and Facebook 1912),with node counts ranging from 60-800.Additionally,assessments were conducted on four collaborator network datasets(Computer Science,Engineering,Chemistry,and Medicine)featuring node counts ranging from 1.4 ×10 4 to 6.4 ×10 4.Comparative analyses with seven prevalent overlapping community discovery methods,encompassing both traditional and graph autoencoder-based algorithms,demonstrated a noteworthy 2.1%improvement under the normalized mutual information(NMI)evaluation index.This performance enhancement substantiated the tangible advantages and effectiveness of the proposed GAME model.[Conclusions]The integration of an efficient modularity maximization loss function into the graph autoencoder model,as demonstrated by the GAME model,successfully addresses the conventional limitations of graph autoencoders.These models often prioritize the reconstruction of local node connections during community discovery tasks,often overlooking the overarching structure of the community,particularly when confronted with overlapping nodes.The experimentally validated performance boost underscores the GAME model's efficacy in navigating the complexities of overlapping community discovery compared to mainstream methods.However,it is worth noting that the model's reliance on substantial memory resources can become a challenge when handling datasets that combine network structure and node attributes.This is especially apparent in scenarios with small attribute networks(N≤800),where the model exhibits insensitivity to the threshold ρ variation.Future work will focus on refining the model to mitigate these challenges and ensure optimal performance across a broader spectrum of real-world scenarios.
community detectionoverlapping communitiesgraph autoencodermodularity maximizationcommunity membership matrix