首页|面向内存数据库的类字典树索引综述与性能比较

面向内存数据库的类字典树索引综述与性能比较

扫码查看
如何快速存取海量数据是大数据时代数据库系统面临的重大挑战.利用大内存构建内存数据库系统是实现大数据实时存取的可行途径.在此背景下,用于加速内存数据存取的内存数据库索引成为近几年国内外的研究热点.但是,内存数据库索引也面临着诸多挑战.以常见的内存B+树索引为例,第一个问题是索引的空间效率较低,这是因为内存B+树索引的节点内部存在较大的空间浪费;第二个问题是索引的查询复杂度较高,B+树的查询复杂度受限于数据规模,随着数据规模的扩张,索引的搜索效率也会下降;第三个问题是变长数据支持弱,B+树对于变长键的支持比较差,往往难以适应实际应用的需要.近年来,由于字典树具有空间代价低、查询效率与数据规模无关、支持变长键等优点,逐步成为了内存数据库索引研究中的一个主要方向.本论文围绕面向内存数据库的类字典树索引,首先介绍了字典树的概念、特点和历史,然后系统梳理和总结了类字典树索引的现状和最新进展,之后提出了一种全新的分类方法对类字典树索引进行了分类.在此基础上,论文对主流的六种类字典树索引进行了实验,在多个数据集和负载上进行了性能对比,并基于实验结果讨论了类字典树索引的设计和使用建议,最后展望了未来类字典树索引的发展方向.
Review and Performance Comparison of Trie-like Indexes for Main-Memory Databases
Accelerating query performance over massive data has become a major challenge in da-tabase systems in the big data era.Building main-memory database systems on top of large mem-ory has been a feasible way to support real-time big data access.Accordingly,main-memory in-dexes for accelerating data access have been a research focus in recent years.However,main-memory indexes face a few challenges.Taking the main-memory B+tree as an example,the first issue is its low space efficiency stemming from the space overheads incurred by the internal nodes.Second,the B+tree's query complexity directly correlates with data size,leading to di-minishing search efficiency as dataset volumes escalate.Third,the B+tree cannot deal with vari-able-length keys efficiently,failing to meet the real requirements of applications.In recent years,trie-like indexes(also called trie indexes)have emerged as a focal point in main-memory index re-search due to their advantages of low space cost,high data-scale-independent query efficiency,and supporting variable-length keys.This paper focuses on trie-like indexes for main-memory da-tabases.We first introduce some foundational aspects of trie indexes,including their concept,characteristics,and history.Then,we summarize the current achievements and advances of trie-like indexes.Next,we provide a new taxonomy to classify trie-like indexes,based on which we implement six prominent trie-like indexes and evaluate their performance on various datasets and workloads.Further,we present some suggestions for the design and use of trie-like indexes based on the experimental results.Finally,we discuss the future development of trie-like inde-xes.

main-memory databasetrie indexperformance comparison

储召乐、罗永平、金培权

展开 >

中国科学技术大学计算机科学与技术学院 合肥 230027

中国科学院电磁空间信息重点实验室 合肥 230027

内存数据库 字典树索引 性能对比

国家自然科学基金面上项目

62072419

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(9)
  • 1