基于邻接比特压缩表的频繁闭项集挖掘算法
A High Spatial Efficiency Algorithm for Frequent Closed Items Mining with Compressed Adjacency Byte Table
杨博超 1吴美璇 1胡浩 1朱敏1
作者信息
- 1. 四川大学计算机学院,四川 成都 610065
- 折叠
摘要
频繁闭项集(Frequent Closed Items,FCI)是一种表示事物之间关联关系的有效方式,它能克服频繁项集(Frequent Items,FI)信息冗余的缺点.FCI挖掘算法研究旨在以更优的时空效率,在原始数据集中找到所有的FCI.相关研究成果重在关注时间效率的提升,但空间效率欠佳.提出一种高空间压缩率数据结构——邻接比特压缩表(Compressed Adjacency Byte table,Cab-table),将项集与交易集压缩到剔除全部0 之后的比特表中,使空间高度压缩.基于此数据结构的频繁闭项集挖掘算法(Cab-Miner),采用运算栈与检索栈来实现非递归方式的频繁闭项集挖掘,相较于之前普遍采用递归方式的算法,在理论上可使空间占用率由O(L∗N+M)降为O(3N).基于公开数据集与真实数据集的实验表明,上述算法在原始数据集压缩,以及运算内存消耗上,都有较优的表现,尤其在处理真实数据集时,空间表现极佳.另外在某些属性的数据集上也表现出优越的时间效率.
Abstract
Frequent closed items(FCI)is an effective data structure for expressing the relationships among things,which can overcome the defect of frequent items(FI)in information redundancy.The purpose of study in fre-quent closed items mining is to find all frequent closed items in the original dataset with higher spatiotemporal effi-ciency.Unfortunately,lots of studies focused on the time efficiency improving but lost sight of spatial optimization.We proposed a data structure(Cab-table:Compressed Adjacency Byte table)for data compression efficiently,which can compress item set and transaction set into byte table without zero.Furthermore,we proposed a FCI mining algorithm with Cab-table,called Cab-Miner.In the algorithm we designed one retrieve stack and one operation stack to realize the non-recursive FCI mining,which is different from most of other algorithms.Compared to the recursive algorithm,our algorithm's space efficiency is O(2N+M)instead of O(LN+M).Several experiments were carried out with public data and real data,then we proved that our algorithm has better space occupation performance in initial data set com-pression and operation memory consumption,especially when the data set is collected from real scenes.Additionally,Cab-Miner also consumes lower time process in some data set with special properties.
关键词
频繁闭项集/邻接比特压缩表/非递归算法/高空间效率Key words
Frequent closed items/Compressed adjacency byte table/Non-recursive algorithm/High spatial effi-ciency引用本文复制引用
出版年
2024