面向Select和Sort的数据库算子缓存的设计与实现

扫码查看

原文链接

万方数据
维普

中文摘要：缓存是数据库中提高查询性能的一种常用技术.目前,现有数据库缓存主要有两个方向:查询结果缓存和存储层块缓存.查询结果缓存是利用数据库查询执行的最终结果或中间结果(如子查询),而存储层块缓存则缓存查询涉及的底层数据块.本文从另外一个角度"缓存中含有的计算量"来重新审视缓存在查询优化中的应用,并以此为基础进一步划分数据库缓存方式.在查询执行过程中,数据库查询被转换成一系列操作(例如选择、排序等)的集合,而算子对应操作.查询处理中算子输出的数据为中间结果,含有部分计算量,我们将这部分数据进行缓存并加以利用.我们将这种缓存部分计算量的缓存方式称为算子缓存,即缓存每个操作执行后的结果.由于不同查询之间可能会存在相同算子,对相近数据执行相同计算,因此利用算子缓存加速查询执行性能具有相当大的潜力.本文的新颖之处在于从缓存含有的计算量角度出发,提出并研究算子缓存如何在查询优化中应用.本文以Filter、Sort算子为例,针对缓存复用提出了一种基于语义树的匹配算法,用于快速匹配缓存中的结果集.同时,针对复用缓存可能劣化查询性能的情况,提出使用基于成本的代价优化器防止使用缓存劣化查询性能.最后,本文基于开源分析型数据库ClickHouse实现了 Filter、Sort算子缓存的原型,并对提出的算子缓存方案进行了大量的实验测试.结果表明,相比块缓存、物化视图方式,本文提出的算子缓存方案在本地SSD部署下最大能够分别提升9倍以及1.5倍的查询响应速度,在云环境下部署能够分别提升30倍以及2倍的查询响应速度.

外文标题：Design and Implementation of Database Operator Cache for Select and Sort

外文摘要：Caching is a commonly used technique in databases to enhance query performance.Currently,existing database caching primarily falls into two directions:result caching and block caching.Result caching involves utilizing the final or intermediate results(such as subqueries)obtained during the execution of database queries,while block caching stores the underlying data blocks involved in the queries.This paper takes a different perspective,focusing on the'compu-tational load'contained within the cache,to reexamine the application of caching in query optimi-zation.Building on this,the paper further classifies database caching methods.During query exe-cution,a database query is transformed into a collection of operations(such as selection,sorting,etc.),with each operation corresponding to an operator.The data output by operators during query processing forms intermediate results,containing a portion of the computational load.We cache and leverage this subset of data.This caching approach,which caches a portion of the com-putational load,is termed'operator caching',specifically caching the results of each operation execution.Due to the potential presence of common operators across different queries,perform-ing similar computations on similar data,leveraging operator caching holds significant potential for accelerating query execution performance.The novelty of this paper lies in its exploration of how operator caching can be applied in query optimization,viewed from the perspective of the computational load contained in the cache.Using Filter and Sort operators as examples,we pro-pose and investigate how operator caching can be employed in query optimization.For cache re-use,we introduce a semantic tree-based matching algorithm designed to efficiently match result sets in the cache.Simultaneously,to address the potential degradation of query performance caused by reusing the cache,we suggest the use of a cost-based optimizer to prevent the degrada-tion of query performance.Finally,based on the open-source analytical database ClickHouse,this paper implements a prototype of the Filter and Sort operator caching and conducts extensive experimental testing of the proposed operator caching scheme.The results indicate that,com-pared to block caching and materialized view approaches,the operator caching solution proposed in this paper can achieve a maximum improvement of 9 times and 1.5 times in query response speed when deployed on a local SSD.When deployed in a cloud environment,the operator cac-hing approach can achieve respective improvements of 30 times and 2 times in query response speed.

外文关键词：

databasequery executionquery optimizationoperator cacheOLAP

作者：

蔡万里、王新硕、胡卉芪、蔡鹏、周烜、屠要峰

展开 >

作者单位：

华东师范大学数据科学与工程学院上海 200062

中兴通讯股份有限公司南京 210012

关键词：

数据库查询执行查询优化算子缓存联机分析处理

基金：

国家自然科学基金上海市自然科学基金中兴通讯研究基金

项目编号：

9227020223ZR1418300HC-CN-20220721010

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.02084

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(9)

参考文献量4