首页期刊导航|ACM Computing Surveys
期刊信息/Journal information
ACM Computing Surveys
Association for Computing Machinery
ACM Computing Surveys

Association for Computing Machinery

双月刊

0360-0300

ACM Computing Surveys/Journal ACM Computing SurveysSCIISTPAHCI
正式出版
收录年代

    Knowledge Distillation on Graphs: A Survey

    YIJUN TIANSHICHAO PEIXIANGLIANG ZHANGCHUXU ZHANG...
    189.1-189.16页
    查看更多>>摘要:Graph Neural Networks (GNNs) have received significant attention for demonstrating their capability to handle graph data. However, they are difficult to be deployed in resource-limited devices because of model sizes and scalability constraints imposed by the multi-hop data dependency. In addition, real-world graphs usually possess complex structural information and features. Therefore, to improve the applicability of GNNs and fully encode the complicated topological information, Knowledge Distillation on Graphs (KDG) has been introduced to build a smaller but effective model, leading to model compression and performance improvement. Recently, KDG has achieved considerable progress, with many studies proposed. In this survey, we systematically review these works. Specifically, we first introduce the challenges and bases of KDG, then categorize and summarize the existing work of KDG by answering the following three questions: (1) what to distillate, (2) who to whom, and (3) how to distillate. We offer in-depth comparisons and elucidate the strengths and weaknesses of each design. Finally, we share our thoughts on future research directions.

    Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

    JIAYI KUANGYING SHENJINGYOU XIEHAOHAO LUO...
    190.1-190.36页
    查看更多>>摘要:Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is to provide an overview of the development of VQA and a detailed description of the latest models with high timeliness. This survey gives an up-to-date synthesis of natural language understanding of images and text, as well as the knowledge reasoning module based on image-question information on the core VQA tasks. In addition, we elaborate on recent advances in extracting and fusing modal information with vision-language pretraining models and multimodal large language models in VQA. We also exhaustively review the progress of knowledge reasoning in VQA by detailing the extraction of internal knowledge and the introduction of external knowledge. Finally, we present the datasets of VQA and different evaluation metrics and discuss possible directions for future work.

    Compact Data Structures for Network Telemetry

    SHIR LANDAU-FEIBISHZAOXING LIUJENNIFER REXFORD
    191.1-191.31页
    查看更多>>摘要:Collecting and analyzing of network traffic data (network telemetry) plays a critical role in managing modern networks. Network administrators analyze their traffic to troubleshoot performance and reliability problems and to detect and block cyberattacks. However, conventional traffic-measurement techniques offer limited visibility into network conditions and rely on offline analysis. Fortunately, network devices-such as switches and network interface cards-are increasingly programmable at the packet level, enabling flexible analysis of the traffic in place, as the packets fly by. However, to operate at high speed, these devices have limited memory and computational resources, leading to trade-offs between accuracy and overhead. In response, an exciting research area emerged, bringing ideas from compact data structures and streaming algorithms to bear on important networking telemetry applications and the unique characteristics of high-speed network devices. In this article, we review the research on compact data structures for network telemetry and discuss promising directions for future research.

    Adversarial Patterns: Building Robust Android Malware Classifiers

    DIPKAMAL BHUSALNIDHI RASTOGI
    192.1-192.34页
    查看更多>>摘要:Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks that perform slight modifications in malware samples, leading to misclassification from malignant to benign. Numerous defense approaches have been proposed to either detect such adversarial attacks or improve model robustness. These approaches have resulted in a multitude of attack and defense techniques and the emergence of a field known as ‘adversarial machine learning.' In this survey paper, we provide a comprehensive review of adversarial machine learning in the context of Android malware classifiers. Android is the most widely used operating system globally and is an easy target for malicious agents. The paper first presents an extensive background on Android malware classifiers, followed by an examination of the latest advancements in adversarial attacks and defenses. Finally, the paper provides guidelines for designing robust malware classifiers and outlines research directions for the future.

    Towards Lifelong Learning of Large Language Models: A Survey

    JUNHAO ZHENGSHENGJIE QIUCHENGMING SHIQIANLI MA...
    193.1-193.35页
    查看更多>>摘要:As the applications of large language models (LLMs) expand across diverse fields, their ability to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods with static datasets are inadequate for coping with the dynamic nature of real-world information. Lifelong learning, or continual learning, addresses this by enabling LLMs to learn continuously and adapt over their operational lifetime, integrating new knowledge while retaining previously learned information and preventing catastrophic forgetting. Our survey explores the landscape of lifelong learning, categorizing strategies into two groups based on how new knowledge is integrated: Internal Knowledge, where LLMs absorb new knowledge into their parameters through full or partial training, and External Knowledge, which incorporates new knowledge as external resources such as Wikipedia or APIs without updating model parameters. The key contributions of our survey include: (1) introducing a novel taxonomy to categorize the extensive literature of lifelong learning into 12 scenarios; (2) identifying common techniques across all lifelong learning scenarios and classifying existing literature into various technique groups; (3) highlighting emerging techniques such as model expansion and data selection, which were less explored in the pre-LLM era.

    Green Federated Learning: A New Era of Green Aware AI

    DIPANWITA THAKURANTONELLA GUZZOGIANCARLO FORTINOFRANCESCO PICCIALLI...
    194.1-194.36页
    查看更多>>摘要:The development of AI applications, especially in large-scale wireless networks, is growing exponentially, alongside the size and complexity of the architectures used. Particularly, machine learning is acknowledged as one of today's most energy-intensive computational applications, posing a significant challenge to the environmental sustainability of next-generation intelligent systems. Achieving environmental sustainability entails ensuring that every AI algorithm is designed with sustainability in mind, integrating green considerations from the architectural phase onwards. Recently, Federated Learning (FL), with its distributed nature, presents new opportunities to address this need. Hence, it is imperative to elucidate the potential and challenges stemming from recent FL advancements and their implications for sustainability. Moreover, it is crucial to furnish researchers, stakeholders, and interested parties with a roadmap to navigate and understand existing efforts and gaps in green-aware AI algorithms. This survey primarily aims to achieve this objective by identifying and analyzing over a hundred FL works and assessing their contributions to green-aware artificial intelligence for sustainable environments, with a specific focus on IoT research. It delves into current issues in green federated learning from an energy-efficient standpoint, discussing potential challenges and future prospects for green IoT application research.

    Blockchain-Empowered Trustworthy Data Sharing: Fundamentals, Applications, and Challenges

    THANH LINH NGUYENLAM NGUYENTHONG HOANGDILUM BANDARA...
    195.1-195.36页
    查看更多>>摘要:The rise of data-sharing platforms, driven by public demand for open data and legislativemandates, has raised several pertinent issues. These encompass uncertainties over data accuracy, provenance and lineage, privacy concerns, consent management, and the lack of equitable incentives for data providers. The advanced nature of blockchain makes it well suited to address these concerns. Yet, the limitations of blockchains, particularly their restricted performance, scalability, and high cost, make them less adept at managing the four “Vs” of big data-volume, variety, velocity, and veracity. As the body of work proposing blockchain-based data-sharing solutions grows, so does the confusion in selecting between these platforms, particularly in terms of sharing mechanisms, services, quality of services, and applications. In this article, we aim to fill this knowledge gap through an in-depth survey of blockchain-based data-sharing architectures and applications.We first identify the key challenges of existing data-sharing techniques and lay out the foundations of blockchains. Our focus then shifts to the intersection of blockchain and data sharing,whereinwe aim to clarify the existing landscape and propose a reference architecture for blockchain-based data sharing. Subsequently, we explore various industrial applications of blockchain-based data sharing, spanning healthcare, smart grids, transportation, and decarbonization. For each application, we draw from real-world deployments to present key lessons learned in the implementation of blockchain-based data sharing. Lastly, we shed light on current research challenges and open avenues for further study in this space. This article aims to serve as a comprehensive resource for researchers/practitioners looking to navigate the complex terrain of blockchain-based data-sharing solutions.

    A Comprehensive Survey on Big Data Analytics: Characteristics, Tools and Techniques

    MOHAMMAD SHAHNAWAZMANISH KUMAR
    196.1-196.33页
    查看更多>>摘要:Modern computing devices generate vast amounts of diverse data. It means that a fast transition through various computing devices leads to big data production. Big data with high velocity, volume, and variety presents challenges like data inconsistency, scalability, real-time analysis, and tool selection. Although numerous solutions have been proposed for big data processing, they are often limited in scope and effectiveness. This survey aims to address the lack of comprehensive analysis of big data challenges in relation to machine learning (ML) and the Internet of Things (IoT) environments, particularly concerning the 7Vs of big data. It emphasizes the significance of selecting suitable tools to address each unique big data characteristic, providing a structured approach to manage these challenges effectively. The article systematically reviews big data characteristics and associated techniques, with a detailed discussion of various tools and their applications. Additionally, it analyzes existing ML methods and techniques for IoT data analytics in big data contexts. Through a systematic literature review (SLR), we examine key aspects, including core concepts, benefits, limitations, and the impact of big data on ML algorithms and IoT data analytics.We highlight groundbreaking studies addressing big data challenges to impact future research and enhance big data-driven applications.

    Making Sense of Big Data in Intelligent Transportation Systems: Current Trends, Challenges and Future Directions

    MIAN AHMAD JANMUHAMMAD ADILBOUZIANE BRIKSAAD HAROUS...
    197.1-197.43页
    查看更多>>摘要:Intelligent Transportation Systems (ITS) generate massive amounts of Big Data through both sensory and non-sensory platforms. The data support batch processing as well as stream processing, which are essential for reliable operations on the roads and connected vehicles in ITS. Despite the immense potential of Big Data intelligence in ITS, autonomous vehicles are largely confined to testing and trial phases. The research community is working tirelessly to improve the reliability of ITS by designing new protocols, standards, and connectivity paradigms. In the recent past, several surveys have been conducted that focus on Big Data Intelligence for ITS, yet none of them have comprehensively addressed the fundamental challenges hindering the widespread adoption of autonomous vehicles on the roads. Our survey aims to help readers better understand the technological advancements by delving deep into Big Data architecture, focusing on data acquisition, data storage, and data visualization. We reviewed sensory and non-sensory platforms for data acquisition, data storage repositories for archival and retrieval of large datasets, and data visualization for presenting the processed data in an interactive and comprehensible format. To this end, we discussed the current research progress by comprehensively covering the literature and highlighting challenges that urgently require the attention of the research community. Based on the concluding remarks, we argued that these challenges hinder the widespread presence of autonomous vehicles on the roads. Understanding these challenges is important for a more informed discussion on the future of self-driven technology. Moreover, we acknowledge that these challenges not only affect individual layers but also impact the functionality of subsequent layers. Finally, we outline our future work that explores how resolving these challenges could enable the realization of innovations such as smart charging systems on the roads and data centers on wheels.

    Public Datasets for Cloud Computing: A Comprehensive Survey

    GUOZHI LIUWEIWEI LINHAOTONG ZHANGJIANPENG LIN...
    198.1-198.38页
    查看更多>>摘要:Publicly available datasets are vital to researchers because they permit the testing of new algorithms under a variety of conditions and ensure the verifiability and reproducibility of scientific experiments. In cloud computing research, there is a particular dependence on obtaining load traces and network traces from real cloud computing clusters, which are used for designing energy efficiency prediction, workload analysis, and anomaly detection solutions. To address the current lack of a comprehensive overview and thorough analysis of cloud computing datasets and to gain insight into their current status and future trends, in this article, we provide a comprehensive survey of existing publicly cloud computing datasets. First, we utilize a systematic mapping approach to analyze 968 scientific papers from 6 scientific databases, resulting in the retrieval of 42 datasets related to cloud computing. Second, we categorize these datasets based on 11 characteristics to assist researchers in quickly finding datasets suitable for their specific needs. Third, we provide detailed descriptions of each dataset to assist researchers in gaining a clearer understanding of their characteristics. Fourth, we select 12 mainstream datasets and conduct a comprehensive analysis and comparison of their characteristics. Finally, we discuss the weaknesses of existing datasets, identify challenges, provide recommendations for long-term dataset maintenance and updates, and outline directions for the future creation of new cloud computing datasets.