基于Transformer结构增强的神经网络架构搜索性能预测器

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：神经网络架构搜索(Neural Architecture Search,NAS)作为一种通过搜索算法设计神经网络架构的方法,在计算机视觉和自然语言处理等领域得到广泛应用,相较于人工设计网络,NAS方法可以减少设计成本并提高模型性能.但是NAS的性能评估需要对候选架构进行大量训练,由此带来的计算量占整个NAS的80％以上.为降低计算开销和时间成本,近年来已提出许多基于Transformer的NAS预测器,由于Transformer出色的结构编码能力可以更好地表示拓扑信息,因而得到广泛应用.但是,现有基于Transformer的NAS预测器依然存在三个问题:其一是在预处理阶段,传统的One-hot编码方式描述节点特征的能力较弱,只能区分不同操作节点类型,而难以表达操作的细节特征,如卷积核尺寸等.其二是在编码阶段,Transformer的自注意力机制导致模型结构信息缺失;其三是在评估阶段,现有的Transformer预测器仅使用多层感知机(Multilayer Perceptron,MLP)对前向传播图进行精度预测,忽略了反向传播梯度流对预测精度的影响,因此难以真正拟合NAS评估中的正、反向交替信息流图,导致预测器精度与实际运行精度误差波动极大(10％～90％).为解决上述问题,本文提出了一种基于Transformer结构增强的NAS性能预测方法.首先,在预处理阶段,本文提出了一种超维嵌入方法增加输入数据维度以强化节点操作的参数描述能力,其次,在编码阶段将Transformer编码后的信息与图结构信息共同输入一个图卷积网络(Graph Convolutional Network,GCN),弥补由自注意力机制引起的结构缺失.最后,在性能评估阶段,本文构建了同时包含前向传播和反向传播的全训练图,并将数据集信息、图结构编码与梯度编码共同输入到GCN网络预测器中,使预测结果更贴近模型真实性能.实验结果表明,本方法与目前最先进方法相比,肯德尔相关系数提高了7.45％,训练时间减少了 1.55倍.

外文标题：An Architecture-Enhanced Performance Predictor for Transformer-Based NAS

外文摘要：Neural Architecture Search(NAS)represents a paradigm shift in the design of neural network architectures,transitioning from manual,intuition-based processes to automated,algorithm-driven methods.This evolution has significantly reduced design costs and improved model performance across various tasks in computer vision,natural language processing,and beyond.Despite these advances,the computational expense associated with evaluating the performance of candidate architectures remains a formidable challenge,consuming the lion's share of resources in NAS endeavors.The emergence of NAS predictors based on the Transformer architecture has offered a promising pathway to mitigating these computational demands.The Transformer's superior capability for encoding topological information of neural networks has made it an attractive foundation for developing predictors that can efficiently approximate the performance of vast numbers of architectures without the need for exhaustive training.However,the deployment of Transformer-based NAS predictors has encountered significant obstacles.The initial challenge lies in the pretreatment phase,where traditional encoding schemes like One-hot encoding fall short in capturing the full spectrum of node features within a network architecture.Such schemes can identify different types of operations but lack the granularity to describe the operational parameters with sufficient detail,such as the sizes of convolutional kernels,which are critical for accurately modeling network behavior.During the encoding stage,the reliance on the Transformer's self-attention mechanism introduces a second challenge by potentially overlooking essential structural information.The self-attention mechanism,while powerful for capturing global dependencies,may not fully preserve the hierarchical and spatial relationships inherent in neural network architectures,which are crucial for understanding their performance.The evaluation phase presents a third hurdle,where existing predictors mainly utilize Multilayer Perceptron(MLP)to estimate the performance of architectures based on forward propagation alone.This approach neglects the critical influence of back propagation gradients,an oversight that can lead to substantial inaccuracies.The dynamic interplay between forward and backward propagation in training neural networks is complex,and any predictor that fails to account for this interplay is likely to suffer from considerable predictive error,with discrepancies ranging widely between the estimated and actual performance.Addressing these challenges,our proposed solution,"An Architecture-Enhanced Performance Predictor for Transformer-based NAS,"introduces innovative strategies at each stage of the prediction process.In the pretreatment phase,we advocate for a novel encoding approach,Hyper-dimensional Embedding,which significantly expands the input feature space to provide a richer,more nuanced representation of node operations.This method allows for a more detailed characterization of architectural components,thereby enhancing the predictive model's ability to discern subtle differences between architectures.To resolve the issues identified in the encoding stage,our approach integrates the Transformer's output with additional structural information through a Graph Convolutional Network(GCN).This strategy ensures that the spatial and hierarchical relationships between architectural components are maintained and factored into the prediction process,addressing the deficiencies of the self-attention mechanism in preserving structural integrity.Finally,in the evaluation phase,we construct a more comprehensive model of network behavior by incorporating both forward and backward propagation dynamics into our predictive framework.This involves feeding a combination of dataset characteristics,structural encodings,and gradient information into the GCN predictor,thereby achieving a more accurate and holistic assessment of architectural performance.Our experimental results demonstrate the effectiveness of this approach,showing a marked improvement in the Kendall correlation coefficient by 7.45％and a reduction in training time by 1.55 times compared to state-of-the-art methods.This significant advancement underscores the potential of our proposed solution to enhance the efficiency and accuracy of NAS processes,paving the way for more rapid and cost-effective development of high-performing neural networks.

外文关键词：

predictorNAStransformerGCNembedding

作者：

王继禾、吴颖、迟恒喆、王党辉、梅魁志

展开 >

作者单位：

西北工业大学计算机学院西安 710129

嵌入式系统集成教育部工程中心西安 710129

空天地海一体化大数据应用技术国家工程实验室西安 710129

西安交通大学人工智能与机器人研究所西安 710049

展开 >

关键词：

预测器 NAS Transformer GCN Embedding

基金：

国家自然科学基金&&

项目编号：

6227239362076193

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.01469

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(7)