首页|基于重复数据删除的分层存储优化技术研究进展

基于重复数据删除的分层存储优化技术研究进展

扫码查看
随着全球数据量的爆炸式增长以及数据多样性的日益丰富,单一介质层的存储系统逐渐不能满足用户多样化的应用需求.分层存储技术可依据数据的重要性、访问频率、安全性需求等特征将数据分类存放到具有不同访问延迟、存储容量、容错能力的存储层中,已经在各个领域得到广泛应用.重复数据删除是一种面向大数据的缩减技术,可高效去除存储系统中的重复数据,最大化存储空间利用率.不同于单存储层场景,将重复数据删除技术运用于分层存储中,不仅能减少跨层数据冗余,进一步节省存储空间、降低存储成本,还能更好地提升数据I/O性能和存储设备的耐久性.在简要分析基于重复数据删除的分层存储技术的原理、流程和分类之后,从存储位置选择、重复内容识别和数据迁移操作3个关键步骤入手,深入总结了诸多优化方法的研究进展,并针对基于重复数据删除的分层存储技术潜在的技术挑战进行了深入探讨.最后展望了基于重复数据删除的分层存储技术的未来发展趋势.
Research Progress on Optimization Techniques of Tiered Storage Based on Deduplication
With the explosive growth of global data volume and the increasing diversity of data,storage systems with a single media layer are gradually unable to meet the diverse application demand of users.Tiered storage can classify and store data into storage layers with different access latency,storage capacity,and fault tolerance based on the importance,access frequency,securi-ty requirements,and other characteristics of the data.It has been widely applied in various fields.Deduplication is a big data re-duction technique that can efficiently remove duplicate data from storage systems and maximize storage space utilization.Unlike single storage layer scenarios,applying deduplication to tiered storage can not only reduce cross-layer data redundancy,further save storage space and reduce storage costs,but also improve data I/O performance and storage device durability.After a brief analysis of the principle,process,and classification of deduplication based tiered storage,this paper starts with three key steps:storage location selection,duplicate content identification,and data migration operation.It summarizes the research progress of many optimization methods and explores the potential technical challenges of deduplication based tiered storage.Finally,the fu-ture development trends of deduplication based tiered storage is prospected.

DeduplicationTiered storageStorage location selectionDuplicate content identificationData migration

姚子路、付印金、肖侬

展开 >

国防科技大学计算机学院 长沙 410073

中山大学国家超级计算广州中心 广州 510006

重复数据删除 分层存储 存储位置选择 重复内容识别 数据迁移

2025

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2025.52(1)