计算机研究与发展2024,Vol.61Issue(4) :824-839.DOI:10.7544/issn1000-1239.202220818

基于缓存访问模式的C-AMAT测量方法及其在图计算中的应用

C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing

陈炳彰 刘伟 于萧钰
计算机研究与发展2024,Vol.61Issue(4) :824-839.DOI:10.7544/issn1000-1239.202220818

基于缓存访问模式的C-AMAT测量方法及其在图计算中的应用

C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing

陈炳彰 1刘伟 2于萧钰1
扫码查看

作者信息

  • 1. 武汉理工大学计算机与人工智能学院 武汉 430073
  • 2. 武汉理工大学计算机与人工智能学院 武汉 430073;交通物联网技术湖北省重点实验室(武汉理工大学) 武汉 430073
  • 折叠

摘要

图应用是大数据领域的一个重要分支,尽管图分析在显示表示实体之间关系的能力相比传统的关系数据库具有更显著的性能优势,但图处理中大量的随机访问所导致的不规则访存模式破坏了访存的时间和空间局部性,从而对片外内存系统造成了很大的性能压力.因此如何正确度量图应用在内存系统中的性能,对于高效的图应用体系结构优化设计至关重要.并发式平均存储访问时间(concurrent average memory access time,C-AMAT)模型作为平均存储访问时间(average memory access time,AMAT)的扩展,同时考虑了存储器访问的局部性和并发性,能够更准确地对现代处理器下图应用在存储系统中的性能进行评估分析.但C-AMAT模型忽略了处理器下级cache层串行访问的事实,这会导致计算的不准确性,同时由于计算所需参数纯粹缺失周期等难以获取的原因,也使得C-AMAT难以进行实际应用.为了使C-AMAT的计算模型与现代计算机中的存储器访问模式相匹配,基于C-AMAT提出了PC-AMAT(parallel C-AMAT),SC-AMAT(serial C-AMAT),其中PC-AMAT,SC-AMAT分别从cache的并行和串行访问模式对C-AMAT的计算模型进行了细粒度的扩展和表征,并在此基础上设计并实现了纯粹缺失周期的提取算法,避免直接测量带来的巨大硬件开销.实验结果表明,在单核和多核模式下,PC-AMAT和SC-AMAT与IPC之间的相关性比C-AMAT更强,最终利用PC-AMAT和SC-AMAT度量和分析了图应用的存储器性能并据此提出图应用访存优化策略.

Abstract

Graph application is an important branch in the field of big data.Although graph analysis has more significant performance advantages than traditional relational databases in displaying the relationship between entities,the irregular memory access pattern caused by a large number of random accesses in graph processing destroys the time and space locality of memory access,thus causing great performance pressure on the off-chip memory system.Therefore,how to correctly measure the performance of graph application in memory system is crucial for efficient graph application architecture optimization.As an extension of average memory access time(AMAT),concurrent average memory access time(C-AMAT)takes into account the locality and concurrency of memory access,and can more accurately evaluate and analyze the performance of modern processors in the storage system.However,the C-AMAT model ignores the fact that the lower-level cache layer of the processor accesses serially,which will lead to the inaccuracy of the calculation.At the same time,it is difficult to obtain the parameters required for the calculation due to the"pure miss cycle"and other reasons,which also makes it difficult for C-AMAT to be applied in practice.In order to match the computing model of C-AMAT with the memory access mode in modern computers,we propose parallel C-AMAT(PC-AMAT)and serial C-AMAT(SC-AMAT)based on C-AMAT.PC-AMAT and SC-AMAT respectively extend and characterize the computing model of C-AMAT from the parallel and serial access modes of cache.On this basis,we design and implement a"pure miss cycle"extraction algorithm to avoid the huge hardware overhead caused by direct measurement.The experimental results show that the correlation between PC-AMAT and SC-AMAT,and IPC is stronger than that of C-AMAT in single-core and multi-core mode.Finally,PC-AMAT and SC-AMAT are used to measure and analyze the memory performance of graph application,based on which the optimization strategy of graph application access is proposed.

关键词

图应用/图分析/平均存储访问时间/并发式平均存储访问时间/纯粹缺失周期/缓存

Key words

graph application/graph analysis/AMAT/C-AMAT/pure miss cycle/cache

引用本文复制引用

基金项目

国家自然科学基金(62272356)

计算机体系结构国家重点实验室(中国科学院计算技术研究所)开放基金(CARCHB202015)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCDCSCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量27
段落导航相关论文