首页期刊导航|ACM Transactions on Database Systems
期刊信息/Journal information
ACM Transactions on Database Systems
Association for Computing Machinery
ACM Transactions on Database Systems

Association for Computing Machinery

季刊

0362-5915

ACM Transactions on Database Systems/Journal ACM Transactions on Database SystemsSCIISTPEI
正式出版
收录年代

    Tight Fine-Grained Bounds for Direct Access on Join Queries

    KARL BRINGMANNNOFAR CARMELISTEFAN MENGEL
    1.1-1.44页
    查看更多>>摘要:We consider the task of lexicographic direct access to query answers. That is, we want to simulate an arraycontaining the answers of a join query sorted in a lexicographic order chosen by the user. A recent dichotomyshowed for which queries and orders this task can be done in polylogarithmic access time after quasilinearpreprocessing, but this dichotomy does not tell us how much time is required in the cases classified as hard.We determine the preprocessing time needed to achieve polylogarithmic access time for all join queries and alllexicographical orders. To this end, we propose a decomposition-based general algorithm for direct access onjoin queries.We then explore its optimality by proving lower bounds for the preprocessing time based on thehardness of a certain online Set-Disjointness problem, which shows that our algorithm’s bounds are tight forall lexicographic orders on join queries. Then, we prove the hardness of Set-Disjointness based on the Zero-Clique Conjecture, which is an established conjecture from fine-grained complexity theory. Interestingly,while proving our lower bound, we show that self-joins do not affect the complexity of direct access (upto logarithmic factors). Our algorithm can also be used to solve queries with projections and relaxed orderrequirements, though in these cases, its running time is not necessarily optimal. We also show that similartechniques to those used in our lower bounds can be used to prove that, for enumerating answers to Loomis-Whitney joins, it is not possible to significantly improve upon trivially computing all answers at preprocessing.This, in turn, gives further evidence (based on the Zero-Clique Conjecture) to the enumeration hardness ofself-join-free cyclic joins with respect to linear preprocessing and constant delay.

    Marrying Top-k with SkylineQueries: Operators with Relaxed Preference Input and Controllable Output Size

    KYRIAKOS MOURATIDISKEMING LIBO TANG
    2.1-2.37页
    查看更多>>摘要:The two paradigms to identify records of preference in a multi-objective setting rely either on dominance(e.g., the skyline operator) or on a utility function defined over the records’ attributes (typically using a top-kquery). Despite their proliferation, each has its own palpable drawbacks. Motivated by these drawbacks, weidentify three hard requirements for practical decision support, namely, personalization, controllable outputsize, and flexibility in preference specification. With these requirements as a guide, we combine elementsfrom both paradigms and propose two new operators, ORD and ORU. We present a suite of algorithms fortheir efficient processing, dedicating more technical effort to ORU, whose nature is inherently more challenging.Specifically, besides a sophisticated algorithm for ORD, we describe two exact methods for ORU andone approximate. We perform a qualitative study to demonstrate how our operators work and evaluate theperformance of our algorithms against adaptations of previous work that mimic their output.

    A Caching-based Framework for Scalable Temporal Graph Neural Network Training

    YIMING LIYANYAN SHENLEI CHENMINGXUAN YUAN...
    3.1-3.46页
    查看更多>>摘要:Representation learning over dynamic graphs is critical for many real-world applications such as socialnetwork services and recommender systems. Temporal graph neural networks (T-GNNs) are powerful representationlearning methods and have demonstrated remarkable effectiveness on continuous-time dynamicgraphs. However, T-GNNs still suffer from high time complexity, which increases linearly with the numberof timestamps and grows exponentially with the model depth, making them not scalable to large dynamicgraphs. To address the limitations, we propose Orca, a novel framework that accelerates T-GNN training bycaching and reusing intermediate embeddings. We design an optimal caching policy, named MRD, for theuniform cache replacement problem, where embeddings at different intermediate layers have identical dimensionsand recomputation costs. MRD not only improves the efficiency of training T-GNNs by maximizing thenumber of cache hits but also reduces the approximation errors by avoiding keeping and reusing extremelystale embeddings. For the general cache replacement problem, where embeddings at different intermediatelayers can have different dimensions and recomputation costs, we solve this NP-hard problem by presentinga novel two-stage framework with approximation guarantees on the achieved benefit of caching. Furthermore,we have developed profound theoretical analyses of the approximation errors introduced by reusingintermediate embeddings, providing a thorough understanding of the impact of our caching and reuseschemes on model outputs. We also offer rigorous convergence guarantees for model training, adding to thereliability and validity of our Orca framework. Extensive experiments have validated that Orca can obtaintwo orders of magnitude speedup over state-of-the-art T-GNNs while achieving higher precision on variousdynamic graphs.

    Synchronizing Disaggregated Data Structures with One-Sided RDMA: Pitfalls, Experiments and Design Guidelines

    MATTHIAS JASNYTOBIAS ZIEGLERJACOB NELSON-SLIVONVIKTOR LEIS...
    4.1-4.40页
    查看更多>>摘要:Remote data structures built with one-sided Remote Direct Memory Access (RDMA) are at the heart of manydisaggregated database management systems today. Concurrent access to these data structures by thousandsof remote workers necessitates a highly efficient synchronization scheme. Remarkably, our investigation revealsthat existing synchronization schemes display substantial variations in performance and scalability.Even worse, some schemes do not correctly synchronize, resulting in rare and hard-to-detect data corruption.Motivated by these observations, we conduct the first comprehensive analysis of one-sided synchronizationtechniques and provide general principles for correct synchronization using one-sided RDMA. Our researchdemonstrates that adherence to these principles not only guarantees correctness but also results in substantialperformance enhancements. This article is an extended version of [72] in which we investigate modern400G NICs. Our findings reveal that the challenges persist even with new generations of NICs. Consequently,we turn our attention to alternative networking hardware, such as smart switches, to address some of thelimitations associated with one-sided synchronization.