首页|基于特征迭代的电力物资供应链数据去重研究

基于特征迭代的电力物资供应链数据去重研究

扫码查看
现有的电力物资供应链数据去重方法均出现去重不完全或删除正常数据的情况,为了加强数据去重效率,有效提高去重性能,提出基于特征迭代的电力物资供应链数据去重研究方法.该方法在特征迭代的帮助下对电力物资供应链数据展开特征提取以及特征分类的预处理,提前简化数据量,降低去重难度和计算量,计算预处理后的数据之间的相似度,利用Counting Bloom Filter算法,计算出符合删除操作的相似度数据,并对其删除,实现电力物资供应链数据去重.实验结果表明,所提方法的存储空间使用量小、去重能力好以及数据去重所需时间短.
Research on Data Deduplication of Power Material Supply Chain Based on Feature Iteration
The existing data deduplication methods of power material supply chain all have the situation of incomplete deduplica-tion or deletion of normal data.In order to strengthen the efficiency of data deduplication and effectively improve the perform-ance of deduplication,a method of data deduplication of power material supply chain based on feature iteration is proposed.This method carries out feature extraction and feature classification preprocessing on the power material supply chain data with the help of feature iteration,simplifies the amount of data in advance,reduces the difficulty of deduplication and the amount of calculation,and calculates the difference between the preprocessed data.With the help of the counting bloom filter algorithm,the similarity data that conforms to the deletion operation is calculated and deleted,so as to realize the data deduplication in the power supply chain.The experimental results show that the proposed method has the advantages of small storage space usage,good deduplication ability and short data deduplication time.

feature iterationpreprocessingdata deduplicationsimilarity calculationfeature extraction

王艳艳、金义、钱诚、许晓艺

展开 >

国网安徽省电力有限公司物资分公司,安徽,合肥 230061

特征迭代 预处理 数据去重 相似度计算 特征提取

2024

微型电脑应用
上海市微型电脑应用学会

微型电脑应用

CSTPCD
影响因子:0.359
ISSN:1007-757X
年,卷(期):2024.40(4)
  • 1
  • 14