Variable Granularity-based Chunk-context Aware Similar Data Deduplication Technique for Cloud Storage
Aiming at the problems of poor effect of existing similar data deduplication techniques and high metadata overhead in cloud storage environment,variable granularity-based chunk-context aware similar data deduplication technique for cloud storage is proposed.The technique adopts a feature extraction algorithm based on sub-block reorganization to perform initial feature extraction of the internal structure of the data block content,and utilizes a BP(Back Propagation)neural network context-aware model to embed the data block contextual feature information into the initial features,realizing a variable granularity data block with contextual semantic embedding.A better representation of similar data blocks is obtained by controlling the data block size,dynamically merging neighboring similar data blocks or non-redundant data blocks to reduce metadata overhead,and segmenting the transition region located between similar and non-redundant data blocks.Finally,to evaluate its performance,a prototype variable granularity similar data detection algorithm,rCARD,is implemented and extensively experimented on real world datasets.The experimental results show that compared to the latest similarity de-tection deduplication technique Finesse,rCARD achieves a higher deduplication rate while significantly reducing the metadata size and ac-celerates the similarity detection speedup by up to 11.07 times.
similar data deduplicationdata block semanticsvariable granularitycloud storagemetadata