大规模图神经网络研究综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：图神经网络凭借其处理非欧氏空间数据及其复杂特征方面的优越性受到了大量的关注,并且被广泛应用于推荐系统、知识图谱、交通道路分析等场景中.面对大规模数据,图结构的不规则性、节点特征的复杂性以及训练样本之间的依赖性对图神经网络模型的计算效率、内存管理以及分布式系统中的通信开销造成了巨大的压力.为应对和缓解以上问题,研究者从应用场景、算法模型、编程框架和硬件结构等多个层面对其进行了优化.本文主要回顾和总结了算法模型及编程框架方面的优化,为读者了解面向大规模数据的图神经网络采样算法以及框架优化相关工作提供帮助,为未来算法-框架协同优化奠定基础.具体来说,本文首先简要介绍图神经网络模型中的消息传递机制,分类介绍常见的图神经网络模型,并分析其在大规模数据训练中面临的困难和挑战;然后对面向大规模数据的图神经网络算法模型进行分类总结和分析,包括基于节点、边和子图的采样算法;接着介绍图神经网络编程框架加速的相关进展,主要包括主流框架的介绍以及优化技术的分类总结和分析;最后对未来面向大规模数据的图神经网络研究进行展望.

外文标题：A Survey of Large-Scale Graph Neural Networks

外文摘要：Graph Neural Networks(GNNs)have garnered increasing attention for their ability to model non-Euclidean graph structures and complex features.They have been applied extensively in various application domains,such as recommender systems,link prediction,and traffic prediction.However,training GNN models on large-scale data poses several challenges,such as irregular graph structures,complex node features,and dependent graph training samples.These challenges can put a strain on computation efficiency,memory management,and the communication cost of distributed computing.To overcome these challenges,many researchers have focused on optimi-zing application methods,algorithm models,programming frameworks,and hardware design.This survey specifically focuses on algorithm optimization and framework acceleration for large-scale GNN models.By examining related works in these areas,this survey aims to help readers understand the existing research as well as lay the foundation for co-optimizing GNN algorithms and frameworks for large-scale data.This survey is structured as follows.Firstly,we provide an overview of the challenges faced by GNNs in large-scale applications and the major optimization methods used to deal with these challenges.In addition,we compare our survey with existing surveys on GNNs.The major difference is that our survey focuses specifically on GNN models in large-scale applications.We summarize and analyze related works on GNN algorithms and framework optimization with a focus on scalability.In the second section,we provide a brief overview of the message passing mechanism and classify GNN models into four categories:Graph Convolutional Networks,Graph Attention Networks,Graph Recurrent Neural Networks,and Graph Autoencoder.For each category,we introduce the major network design,including propagation and aggregation strategies,and analyze the corresponding challenges of processing large-scale data.Furthermore,we provide a summary of the challenges faced by GNN models in large-scale applications,in terms of full-batch and mini-batch training modes.Thirdly,we classify and analyze GNN algorithms for large-scale data.We focus on sampling-based GNNs at different granularities,which use node-,layer-,and subgraph-based sampling strategies to optimize the mini-batch training of GNNs.Specifically,node-based sampling strategies usually select a fixed number of neighbors for each node,layer-based sampling methods operate at each GNN layer,and subgraph-based sampling approaches attempt to find dense subgraphs as mini batches.We provide a summary of each type of sampling strategy,including its key ideas,related works,and a discussion of its advantages and disadvantages.In the fourth section of this survey,we introduce mainstream programming frameworks for GNN models and related optimization techniques for framework acceleration.We briefly introduce mainstream programming frameworks one by one,such as DGL,PyG,Graph-Learn,and also summarize their characteristics.We divide these optimization strategies into five categories:data partition,task scheduling,parallel execution,memory management,and other methods.Finally,we summarize this survey.We also provide prospects for future work in optimizing GNN models and accelerating frameworks for large-scale data,such as reducing redundant computation,algorithm and framework co-optimization,graph-aware optimizations,support for complex graphs,flexible scheduling based on hardware features,optimizations on distributed platforms,framework and hardware co-optimization and minimizing node representation dimensions.

外文关键词：

graph neural networklarge-scale dataalgorithm optimizationframework acceleration

作者：

肖国庆、李雪琪、陈玥丹、唐卓、姜文君、李肯立

展开 >

作者单位：

湖南大学信息科学与工程学院长沙 410082

湖南大学深圳研究院广东深圳 518000

关键词：

图神经网络大规模数据算法优化框架加速

基金：

广东省重点领域研发计划国家自然科学基金国家自然科学基金湖南省科技项目湖南省科技项目广东省自然科学基金深圳市基础研究面上项目之江实验室开放课题

项目编号：

2021B010119000462172157622021492023GK20022021RC30622023A1515012915JCYJ202103241354090262022RC0AB03

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.00148

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(1)

被引量2
参考文献量5