查看更多>>摘要:The current issue contains papers from the ACM SIGMETRICS / IFIP Performance 2024 Student Research Competition held in Venice, Italy in June 2024. There is also one regular paper included. I would like to thank the authors for their contributions and Nicolas Gast and Lishan Yang for the preparation of the special issues. PER preprints can be downloaded from the preprint queue page. You may also consider following the SIGMETRICS LinkedIn group for PER-related announcements.
查看更多>>摘要:Every year, the Association for Computing Machinery (ACM) spearheads a series of Student Research Competitions (SRCs) at ACM-sponsored or co-sponsored conferences. These SRCs provide graduate (Masters or PhD program) and undergraduate students an opportunity to present one of their research project to the conference. At ACM SIGMETRICS, the competition was held as follows. First, the students submitted an extended abstract to a program committee who selects a subset of the abstracts that will be presented at the conference. A few of these posters were selected for an oral presentation. Participating students have the opportunity to win cash prizes. In each category (graduate and undergraduate), the 1st, 2nd, and 3rd place winners receive prizes of $500, $300, and $200, respectively. Additionally, first place undergraduate and graduate student winners from the SRCs held during the year are invited to present their work at the SRC Grand Finals, where a different panel of judges evaluates these winners against each other.
查看更多>>摘要:We propose Aristos a communication-optimal, distributed algorithm that uses asynchronous communication interleaved with computation for specifically tackling the communication overhead of Distributed Mixture-of-Experts (DMoE) transformer models. DMoE as implemented today entails frequent synchronous All-to-All communication operations that scale linearly per step with a model's number of layers. Thus, we first seek clarification on how All-to-All bottlenecks DMoE and whether the interconnect or communication algorithm is at fault. We investigate more than 21k All-to-All CUDA kernels and empirically confirm that their runtimes exhibit a significant long-tail distribution in both multi-node and singlenode settings, respectively. We argue that this phenomenon is a shortcoming of the global barrier necessitated by the synchronous implementation of All-to-All in state-of-the-art collective libraries. We use these empirical insights to motivate Aristos, which obviates the global barrier, instead favoring asynchronous communication yielding native support for pipelining. Aristos also exposes tunable hyperparameters that navigate the tradeoff of faster performance and reduced token dropping.
查看更多>>摘要:Synthetic traffic traces are useful for training traffic classifiers in privacy-constrained environments. Generative Artificial Intelligence (GAI) models are blossoming as a solution to avoid the sharing of real data and the lack of datasets. Nevertheless, privacy concerns about GAI are often underestimated. Therefore, an approach to mitigate the data leakage of a GAI is presented in this paper, with a minimum impact on the utility of synthetic traffic traces for downstream applications. For example, training a Machine Learning (ML) traffic classifier on synthetic traffic traces results in an average accuracy loss of no more than 13% concerning training on a real dataset.
查看更多>>摘要:We introduce a concrete and natural definition of hierarchical trees, which is not limited to HSBMs. Notably, the definition ensures the uniqueness of the hierarchical tree with the largest number of vertices (maximum-vertices hierarchical tree). It also encompasses fiat communities as special cases, represented as primitive communities directly linked to the tree's root. This allows us to differentiate between flat and hierarchical communities. Furthermore, we propose a method to discover the maximum-vertices hierarchical tree, which is merging vertices on the tree obtained by linkages.
查看更多>>摘要:Federated Learning (FL) enables clients (mobile or IoT devices) to train a shared machine learning model coordinated by a central server while keeping their data local, addressing communication and privacy concerns. In the FedAvg algorithm, clients perform multiple local stochastic gradient descent (SGD) steps on their datasets and send their model updates to the server. The server then aggregates these client updates to produce the new global model and sends this back to the clients for the subsequent iteration.
查看更多>>摘要:Preemptive scheduling policies are ubiquitous in queueing theory, but analysis of such policies has not touched one important aspect: preemption overhead, the extra work to pause and resume a preempted job. Such an analysis is difficult because queueing systems with preemption overhead are arrival-sensitive. An arrival-sensitive queueing system is one in which a job's service time is affected by the arrivals that occur during its service. We can view a preemptive priority queue with preemption overhead as arrival-sensitive in the following way. Suppose a class 2 job J_2 is being served when a class 1 job J_1, which has higher priority, arrives. J_1's arrival triggers the preemption of J_2, incurring preemption overhead. We can view this overhead as extending J_2's service time.
查看更多>>摘要:Federated learning (FL) is a prominent distributed learning framework that allows clients to train machine learning models orchestrated by a parameter server (PS) . Unfortunately, its practical implementation is fundamentally hindered by heterogeneous local data distribution and intermittent client availability . Moreover, the availability of a client may be highly dynamic and hard to predict. For example, the mobility and the varying local workloads of edge devices make their availability extremely uncertain due to obstacles such as tunnels and tall buildings . Let p_i~t define the the client i's available probability. Intuitively, more active clients tend to bias the global model toward its own dataset, and the bias extent can be substantial . Fig. 1 shows an illustration. Most existing work either assumes the client's availability p_i~t's as a known prior or limits itself to benign dynamics. In the former, the PS recruits a small set of clients uniformly at random or in proportion to the volume of local data. Alternatively, the PS manipulates the known statistics of p_i~t's to save expensive communications or to accelerate the convergence. However, it is often difficult for all parties in a distributed system to have a full grasp of the p_i~t's in real time. Moreover, the client's availability stems inherently from their deployment environment and device setups, independent of the PS. In the latter, the dynamics are assumed to be Markovian with a stationary distribution, regularized participation, time-invariant Pi~t' s, bounded coming back time. Nevertheless, as client availability is highly intricate, well-regularized participation or stationary participation only rarely exist. Furthermore, the implementation of MIFA in stresses a distributed system with additional O(md) memory, where m, d are the number of clients and the dimension of the model, respectively. studies unreliable communications, yet all clients are on standby for local computations.
查看更多>>摘要:This work proposes a time-continuous model for the generic stochastic reuse problem. We have demonstrated the model delivers accurate and fast locality analysis for simulated Zipfian workloads. Additionally, the Max-Error partitioning effectively reduces the time complexity, shedding light on adopting the technique in real web cache analysis and monitoring.
查看更多>>摘要:Many of clustering algorithms for a point cloud Χ_n ∈ R~d in the Euclidean space are based on density estimates. In fact, the density function f of point generation contains the relevant information. It is quite natural to try to extract what HARTIGAN called 'high-density clusters' . One elegant solution to do this task consists in constructing a graph whose nodes are the points of the cloud and whose edges connect nearby points. We want the connected components of this graph to reflect the high-density clusters. Some very classical algorithms such as (Robust) Single-Linkage or (H)DBSCAN work in this way. It is particularly helpful because its connected components correspond exactly to the high-density clusters of the estimator f_(1-Nearest Neighbor).