首页|Dilithium算法的FPGA高效扩展性优化

Dilithium算法的FPGA高效扩展性优化

扫码查看
为提高Dilithium在实际应用中的运行效率,提出了一种Dilithium算法的现场可编程门阵列(Field Programmable Gate Array,FPGA)高效扩展性优化实现.具体在以下几个方面进行优化:将KOA(Karatsuba-Offman-Algorithm)算法与快速模约减算法相结合,构成快速模乘单元,优化数论转换(Number TheoreticTransform,NTT)实现的大量多项式乘法;采用多RAM(Random Access Memory)存取参与运算的多项式系数,根据Dilithium算法的特点,设计了一种多项式系数读取策略,以快速、正确地读取RAM中的多项式系数.针对方案中的采样和散列工作,分析了SHAKE算法系列的特点,设计了一种低延迟可扩展的Keccak硬件架构,使得其能够根据输入信号的不同执行不同的SHAKE算法.实验结果表明,所提方案在频率方面相比其他方案提升了60.7%~131.9%,兼顾硬件的资源消耗和执行效率.
FPGA Efficient Scalability Optimization of Dilithium
To improve the operational efficiency of Dilithium in practical applications,an efficient field programmable gate array(FPGA)implementation of the Dilithium algorithm is proposed.Optimization is carried out in several aspects,including combi-ning the Karatsuba-Offman algorithm(KOA)with the fast modular reduction algorithm to create a fast modular multiplication unit,optimizing the extensive polynomial multiplication achieved through number theoretic transform(NTT)implementation.Multiple RAM accesses are employed for polynomial coefficient operations,and a coefficient reading strategy tailored to the characteristics of the Dilithium algorithm is designed to achieve rapid and accurate reading of polynomial coefficients from RAM.For the sampling and hashing tasks in the scheme,the characteristics of the SHAKE algorithm series are analyzed,leading to the development of a low-latency and scalable Keccak hardware architecture,allowing it to execute different SHAKE algorithms based on the input signal.Experimental results demonstrate that the working frequency of the proposed algorithm is increased by 60.7%~131.9%,while balancing hardware resource consumption and execution efficiency.

Dilithium algorithmFPGANTTHardware implementation

燕云飞、李斌、魏源鑫、张博林、马添翼、周清雷

展开 >

郑州大学计算机与人工智能学院 郑州 450001

Dilithium算法 现场可编程门阵列 数论变换 硬件实现

河南省科技攻关计划河南省网络密码技术重点实验室研究课题

232102211055LNCT2022-A14

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(z1)
  • 14