首页|三元概念的分布式并行构造算法

三元概念的分布式并行构造算法

扫码查看
作为形式概念分析的扩展,三元概念分析在高维数据的理论和应用中均取得显著效果.然而,数据量的极速增长导致三元概念的生成算法的时间复杂度呈指数级增长,在现实应用中面临巨大挑战,需要构造并行算法.因此文中提出适用于大规模数据的三元概念分布式并行构造算法,首先给出对象-属性和属性-条件三元概念的相关理论,并证明所有三元概念可通过合并这两种类型的中间概念生成.然后,采用两阶段聚合策略,改进Spark框架中的弹性分布式数据集操作符,有效解决数据倾斜问题,明显提升算法的运行效率.最后,在多个公开数据集上的实验表明,文中算法在海量数据中的三元概念生成过程中表现高效.
Distributed Parallel Construction Algorithm for Triadic Concepts
As an extension of formal concept analysis,triadic concept analysis achieves significant results in both theory and applications of high-dimensional data.However,the time complexity of triadic concept generation algorithms,caused by the rapid growth of data volume,typically grows exponentially,presenting significant challenges in practical applications.Therefore,parallel algorithms are crucial.In this paper,a distributed parallel construction algorithm for triadic concepts suitable for large-scale data is proposed.First,the theories of object-attribute triadic concepts and attribute-condition triadic concepts are provided,and it is proved that all triadic concepts can be generated by merging these two types of intermediate concepts.Second,a two-stage aggregation strategy is employed to improve the resilient distributed dataset operator in the Spark framework.Consequently,the data skew problem is effectively solved and the efficiency of the proposed algorithm is significantly improved.Finally,experiments on multiple public datasets indicate that the proposed algorithm performs efficiently in generating triadic concepts for large datasets.

Formal ConceptTriadic ConceptDistributed ParallelizationTwo-Stage AggregationData Skew

李金海、王坤、陈强强

展开 >

昆明理工大学信息工程与自动化学院 昆明 650500

昆明理工大学数据科学研究中心 昆明 650500

昆明理工大学理学院 昆明 650500

形式概念 三元概念 分布式并行 两阶段聚合 数据倾斜

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(10)