首页|大图中多样化Top-k模式挖掘算法研究

大图中多样化Top-k模式挖掘算法研究

扫码查看
频繁模式挖掘(Frequent Pattern Mining,FPM)是图数据挖掘领域的一项重要任务.该任务的 目标是从图数据中找到出现频次大于给定阈值的所有模式.近年来,随着社交网络等大规模图数据的涌现,单一大图上的FPM问题受到广泛关注,并得到了较为充分的研究,取得了一系列研究成果.然而,已有技术大都存在着计算成本高、挖掘结果理解困难以及并行计算难等问题.针对上述问题,文中提出了 一种从大规模图数据中挖掘多样化top-k模式的方法.首先设计了一个多样化函数,用于度量模式集合的多样性;随后设计了 一种面向分布式图数据,具有提前终止特性的分布式挖掘算法DisTopk,以实现多样化top-k模式高效挖掘.在真实图数据和合成图数据上进行了大量实验,结果表明,与传统分布式挖掘算法相比,DisTopk算法能更高效地挖掘多样化top-k模式.
Diversified Top-k Pattern Mining on Large Graphs
Frequent pattern mining(FPM)is one of the most important problems in graph mining.The FPM problem is defined as mining all the patterns,with frequency above a user-defined threshold in a large graph.In recent years,with the popularity of social networks and so on,single-graph-based FPM has received more and more attention.Investigators have developed considera-ble techniques,while most of them suffer from high computational cost,inconvenient result inspection and inconvenient in parallel computation.To tackle the issues,this paper proposes an approach to discover diversified top-k patterns from singe large graphs.This paper first designs a diversification function to measure the diversity of patterns,then develops a distributed algorithm with early termination property named DisTopk,to efficiently identify diversified top-k patterns,from distributive stored graphs.Expe-rimental results conducted on real-life and synthetic graphs show that DisTopk can mine diversified top-k patterns more efficient-ly than traditional algorithms.

Frequent pattern miningTop-k patternsResult diversificationDistributed miningEarly termination

何宇昂、王欣、沈玲珍

展开 >

西南石油大学计算机科学学院 成都 610500

频繁模式挖掘 Top-k模式 结果多样性 分布式挖掘 提前终止

四川省科技创新人才基金

2022JDRC0009

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(5)
  • 50