基于工业大数据的重叠社区发现算法
Community overlap discovery algorithm based on industrial big data
康海燕 1景悟 1张仰森1
作者信息
- 1. 北京信息科技大学信息管理学院,北京 100192
- 折叠
摘要
为了深入挖掘和分析工业大数据隐藏的关系、趋势和模式,从而为企业提供更好的决策依据,结合随机游走和标签传播思想,提出一种基于工业大数据的重叠社区发现算法.设计了种子节点选取算法,通过随机游走计算各节点的重要性,选出不相关和重要性高的种子节点;提出重叠社区发现算法,对种子节点赋予唯一标签,迭代进行标签传播直到节点标签不再改变,根据节点标签得到最终的重叠社区划分结果.通过在真实数据集和人工数据集上进行对比实验表明,该算法可以在网络上有效发现高质量的重叠社区,并进一步解决工业大数据的数据分析、信息挖掘等核心问题.
Abstract
Industrial big data has a large scale,complex structure,and high value density.To deeply explore and ana-lyze its hidden relationships,trends and patterns,and to provide better decision-making basis for enterprises,com-bined with the idea of random walk and label propagation,a community overlap discovery algorithm based on indus-trial big data was proposed.The algorithm of seed node selection was designed,the importance of each node was cal-culated by random walk,and the irrelevant and important seed nodes were selected.Then,an overlapping communi-ty discovery algorithm was proposed,the seed node was given a unique label,and the label was propagated iterative-ly until the node label was no longer changed.The final overlapping community division result was obtained accord-ing to the node label.Finally,comparative experiments were carried out on real data sets and artificial data sets,the results showed that the algorithm could effectively find high-quality overlapping communities on the network.The algorithm could be applied to data analysis and information mining of industrial big data.
关键词
工业大数据/社区发现/重叠社区/随机游走/标签传播Key words
industrial big data/community detection/overlapping community/random walk/label propagation引用本文复制引用
基金项目
国家社科基金年度资助项目(21BTQ079)
教育部人文社会科学基金资助项目(20YJAZH046)
出版年
2024