首页|定点FFT的DSP向量混洗并行基4算法

定点FFT的DSP向量混洗并行基4算法

扫码查看
基于定点数据的快速傅里叶变换(Fast Fourier Transform,FFT)算法能在合理的精度范围内降低对硬件的要求,计算速度更快.文章面向高性能向量数字信号处理器(Digital Signal Processor,DSP)的硬件特征,构建基 4 复数FFT算法的高效指令级并行处理模型.该模型充分考虑基 4 方法下的复数FFT计算过程和蝶形组集合的特征,将SIMD计算、向量混洗、索引DMA等技术与复数FFT的基 4 变换过程充分融合,有效控制计算过程中存储器和片内缓存之间的数据块搬移需求,提升SIMD计算单元的利用率.在基于自主YHFT-M7002 处理器的FT-M7002DSK平台上进行试验研究,验证算法的有效性.试验结果表明:与CCS模拟所得TI的相应TMS320C6678 库函数性能相比,所提优化算法的平均加速比达到TI库函数的4.79倍.
Vector Shuffling Based Parallel Radix-4 FFT Algorithm for Fixed Point Data on Vector DSP
The Fast Fourier Transform(FFT)algorithm of fixed-point data can reduce the hardware requirements within a reasonable accuracy range but obtain faster computing speed.Based on the hardware characteristics of high performance vector Digital Signal Processors(DSPs),this paper constructs an efficient instruction level parallel processing algorithm of Radix-4 complex FFT algorithm.This algorithm considers the calculation process of the Radix-4 complex FFT algorithm and the characteristics of butterfly units,and it fully integrates SIMD calculation,vector shuffling,indexing DMA and other techniques with the transformation process of Radix-4 complex FFT.This algorithm effectively controls the data block movement between memory and in-chip cache during the computing process and improves the utilization rate of SIMD processing unit.In this paper,an experimental study is conducted on the FT-M7002DSK platform of the YHFT-M7002 processor,which has independent intellectual property rights.Result shows that,compared with the performance obtained by CCS simulator for the corresponding TMS320C6678 library function,the average performance of our algorithm is 3.79 times faster than the former.

Radix-4 complex FFTSIMD technologyvector DSPvector shufflingindex DMA

王书盈、胡勇华、张鑫、陆浩松

展开 >

湖南科技大学 计算机科学与工程学院,湖南 湘潭 411201

基4复数FFT SIMD技术 向量DSP 向量混洗 索引DMA

湖南省自然科学基金湖南省教育厅科研项目湖南省教育厅科研项目

2023JJ5001920B24219A169

2024

湖南科技大学学报(自然科学版)
湖南科技大学

湖南科技大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.675
ISSN:1672-9102
年,卷(期):2024.39(2)