微电子学与计算机2024,Vol.41Issue(6) :103-114.DOI:10.19304/J.ISSN1000-7180.2023.0330

Flex-DMA:支持多模式高效传输的DMA系统设计

Flex-DMA:design of high-performance multi-transfer mode DMA system

李德建 冯曦 王国旋 谭浪 沈冲飞 范志华 李文明
微电子学与计算机2024,Vol.41Issue(6) :103-114.DOI:10.19304/J.ISSN1000-7180.2023.0330

Flex-DMA:支持多模式高效传输的DMA系统设计

Flex-DMA:design of high-performance multi-transfer mode DMA system

李德建 1冯曦 1王国旋 2谭浪 1沈冲飞 1范志华 2李文明2
扫码查看

作者信息

  • 1. 北京智芯微电子科技有限公司,北京 100192
  • 2. 中国科学院 计算技术研究所,北京 100190;中国科学院大学 计算机科学与技术学院,北京 100049
  • 折叠

摘要

随着数据密集型科学和高通量应用的迅速发展,专用集成电路设计不断涌现,传输系统不再只有数据传输的需求.现有的一些直接存储器访问(Data Memory Access,DMA)设计可以支持高效的矩阵转置传输,但这些设计不能满足复杂的访存模式,也不具有灵活的可配置性,从而降低计算效率.针对这些问题设计了一种可配置的多模式传输系统Flex-DMA,该系统包含可配置的寄存器以及传输通道,拥有基础模式和单指令多数据(Single Instruction Multiple Data,SIMD)模式.因此,Flex-DMA可根据不同的数据传输需求选择不同的传输模式,灵活配置数据规模和数据格式,支持数据向量化转换、矩阵转置传输等功能.在大规模并行模拟框架中对Flex-DMA做性能评估,其结果表明,Flex-DMA在数据向量化处理中可以获得平均 5.14 倍的加速比.此外,与MT-DMA结构相比,Flex-DMA在矩阵转置传输中可以获得平均 2.52 倍性能提升.实验证明:Flex-DMA能满足复杂的访存模式和传输需求,在低传输时延下实现数据的重组和预处理.

Abstract

With the rapid development of data intensive science and high-throughput applications,ASIC designs in special fields are constantly emerging.The transmission system has more needs besides data transmission,and some existing Data Memory Access(DMA)designs can already support efficient matrix transpose transmission.However,these designs cannot meet the complex memory access mode and do not have flexible configurability,resulting in low computational efficiency.Aimed at these problems,a configurable multi-mode transmission system Flex-DMA is designed,which includes configurable registers and transmission channels,and has multiple transmission modes such as basic mode and Single Instruction Multiple Data(SIMD)mode.Due to its configurability,Flex-DMA can select different transmission modes based on various data transmission requirements,flexibly configure data scale and data format,and support vector instruction data conversion and matrix transposition.The performance evaluation of Flex-DMA in a massively parallel simulation framework shows that Flex-DMA can achieve an average speed up of 5.14 times in vectorization processing.In addition,Flex-DMA can achieve an average performance improvement of 2.52 times compared with MT-DMA structures.Experiments prove that Flex-DMA is able to meet complex memory access modes and transmission requirements,and realize data reorganization and preprocessing with low transmission latency.

关键词

直接存储器访问/SIMD模式/数据传输/矩阵转置/预处理

Key words

DMA/SIMD mode/data transmission/matrix transpose/preprocessing

引用本文复制引用

基金项目

国家电网公司科学技术项目(5700-202041264A-0-0-00)

出版年

2024
微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
段落导航相关论文