基于2KNTT的多项式乘法单元设计
A Polynomial Multiplier Design Based on 2KNTT
陈韬 1李慧琴 1吴艾青 1李伟 1南龙梅1
作者信息
- 1. 中国人民解放军战略支援部队信息工程大学,河南郑州 450000
- 折叠
摘要
在格基抗量子公钥密码算法的基础运算中,多项式乘法在硬件实现上消耗大量的时间.为提高实际运算性能,本文通过分析多项式乘法运算中数论变换的快速实现算法,提出一种面向CRYSTALS-Kyber算法、适应硬件实现的2n次单位根预处理型快速数论变换算法架构,利用小位宽数论变换的并行处理与复杂度低的计算形式来减少运算时间.整体运算架构在结合算法特殊性质后,确定了32路并行的设计模型.在此基础上,设计了一种与该架构匹配的统一化运算单元和数据读写不冲突、地址分配最优的存储单元.实验结果表明,在65 nm的互补金属氧化物半导体(CMOS)工艺下,97 ns完成一组项数为256、模数为3 329的多项式乘法运算,花费108个周期,最高工作频率可达到1.1 GHz,面积时间积为20.7(kGE×μs).
Abstract
Polynomial multiplication consumes a lot of time in hardware implementation in the underlying operations of Lattice-based post-quantum public-key cryptography algorithms.The paper analyzes the fast implementation of number theoretic transform algorithm in polynomial multiplication operations for CRYSTALS-Kyber and proposes a 2n-th unit root preprocessing fast number theoretic transform algorithm architecture that adapts to the hardware implementation.In order to reduce computing time,the architecture uses parallel processing of small bit-width number theoretic transformation and low-complexity computations.Taking into account the characteristics of the algorithm,the overall computing architecture adopts a 32-way parallel design model.Based on this,we design a unified computing unit that matches the architecture and a storage unit with non-conflicting mechanism while reading or writing data and optimal address assignment.Under the CMOS 65 nm process,a set of polynomial multiplication operations with term number 256 and modulus 3 329 can be com-pleted in 108 cycles within 97 ns.The maximum operating frequency can reach 1.1 GHz,and the area time product is 20.7(kGE×μs).
关键词
格基抗量子公钥密码算法/CRYSTALS-Kyber/多项式乘法/2KNTT/硬件实现Key words
Lattice-based post-quantum public-key cryptography/CRYSTALS-Kyber/polynomial multiplication/2KNTT/hardware design引用本文复制引用
基金项目
国家自然科学基金(61404175)
国家科技重大专项(2018ZX01027101-00)
出版年
2024