首页|基于MapReduce的大规模网络社区发现算法

基于MapReduce的大规模网络社区发现算法

扫码查看
社区发现是社会网络挖掘领域的基本问题.随着海量数据的迅速产生,传统社区发现算法愈发难以处理大规模社会网络.因此,针对大规模网络设计高效的社区发现算法意义重大.文中提出了一种基于MapReduce和k中心聚类的新型分布式算法.首先,该算法提出"朋友圈系数"技术,该技术可更加准确地度量结点间的距离.其次,该算法提出"两阶段k中心聚类"技术,该技术在选取中心点过程中融入结点中心度启发式信息,可显著优化输出结果的模块度.最后,该算法提出"以模块度为优化目标的社区融合"技术,该技术能够在无先验知识的前提下 自动确定网络中的社区数目.实验结果表明,所提算法的社区发现结果模块度明显优于最先进的社区发现算法.例如,相比LPA算法,其将模块度平均提升9.19倍.
Large-scale Network Community Detection Algorithm Based on MapReduce
Community detection is a fundamental problem in the field of social network mining.With the rapid generation of mas-sive data,traditional community detection algorithms are becoming increasingly difficult to handle large-scale social networks.Therefore,it is of great significance to design efficient community detection algorithms for large-scale networks.This paper pro-poses a new distributed algorithm based on MapReduce and k-center clustering.Firstly,the algorithm proposes the"friend circle coefficient"technique,which can measure the distance between nodes more accurately.Secondly,the algorithm proposes the"two-stage k-center clustering"technique,which incorporates node centrality heuristic information into the process of selecting center points and can significantly optimize the modularity of the results.Finally,the algorithm proposes a"community fusion method with modularity as the optimization goal"technique,which can automatically determine the number of communities in the net-work without prior knowledge.The evaluation results show that the proposed algorithm significantly outperforms the state-of-the-art community discovery algorithms in terms of modularity.For example,compared with the LPA algorithm,the proposed al-gorithm increases the modularity by an average of 9.19 times.

Community detectionk-center clusteringDistributed computingData miningBig data

王瀚橙、戴海鹏、陈志鹏、陈树森、陈贵海

展开 >

计算机软件新技术国家重点实验室(南京大学) 南京 210023

社区发现 k中心聚类 分布式计算 数据挖掘 大数据

国家自然科学基金国家自然科学基金国家自然科学基金

62272223U22A203161872178

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(4)
  • 69