定点FFT的DSP向量混洗并行基4算法
Vector Shuffling Based Parallel Radix-4 FFT Algorithm for Fixed Point Data on Vector DSP
王书盈 1胡勇华 1张鑫 1陆浩松1
作者信息
- 1. 湖南科技大学 计算机科学与工程学院,湖南 湘潭 411201
- 折叠
摘要
基于定点数据的快速傅里叶变换(Fast Fourier Transform,FFT)算法能在合理的精度范围内降低对硬件的要求,计算速度更快.文章面向高性能向量数字信号处理器(Digital Signal Processor,DSP)的硬件特征,构建基 4 复数FFT算法的高效指令级并行处理模型.该模型充分考虑基 4 方法下的复数FFT计算过程和蝶形组集合的特征,将SIMD计算、向量混洗、索引DMA等技术与复数FFT的基 4 变换过程充分融合,有效控制计算过程中存储器和片内缓存之间的数据块搬移需求,提升SIMD计算单元的利用率.在基于自主YHFT-M7002 处理器的FT-M7002DSK平台上进行试验研究,验证算法的有效性.试验结果表明:与CCS模拟所得TI的相应TMS320C6678 库函数性能相比,所提优化算法的平均加速比达到TI库函数的4.79倍.
Abstract
The Fast Fourier Transform(FFT)algorithm of fixed-point data can reduce the hardware requirements within a reasonable accuracy range but obtain faster computing speed.Based on the hardware characteristics of high performance vector Digital Signal Processors(DSPs),this paper constructs an efficient instruction level parallel processing algorithm of Radix-4 complex FFT algorithm.This algorithm considers the calculation process of the Radix-4 complex FFT algorithm and the characteristics of butterfly units,and it fully integrates SIMD calculation,vector shuffling,indexing DMA and other techniques with the transformation process of Radix-4 complex FFT.This algorithm effectively controls the data block movement between memory and in-chip cache during the computing process and improves the utilization rate of SIMD processing unit.In this paper,an experimental study is conducted on the FT-M7002DSK platform of the YHFT-M7002 processor,which has independent intellectual property rights.Result shows that,compared with the performance obtained by CCS simulator for the corresponding TMS320C6678 library function,the average performance of our algorithm is 3.79 times faster than the former.
关键词
基4复数FFT/SIMD技术/向量DSP/向量混洗/索引DMAKey words
Radix-4 complex FFT/SIMD technology/vector DSP/vector shuffling/index DMA引用本文复制引用
基金项目
湖南省自然科学基金(2023JJ50019)
湖南省教育厅科研项目(20B242)
湖南省教育厅科研项目(19A169)
出版年
2024