首页|Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现

Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现

扫码查看
近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较多的时间和消耗较多的资源。为此,提出一种基于图形处理器(GPU)的Falcon密钥树并行生成方案。该方案使用奇偶线程联合控制的单指令多线程(SIMT)并行模式和无中间变量的直接计算模式,达到了提升速度和减少资源占用的目的。基于Python的CUDA平台进行了实验,验证结果的正确性。实验结果表明,Falcon密钥树生成在RTX 3060 Laptop的延迟为6 ms,吞吐量为167次/s,在计算单个Falcon密钥树生成部件时相对于CPU实现了 1。17倍的加速比,在同时并行1 024个Falcon密钥树生成部件时,GPU相对于CPU的加速比达到了约56倍,在嵌入式Jetson Xavier NX平台上的吞吐量为32次/s。
GPU Parallel Optimal Design and Implementation of Key Tree Generation Components for Falcon Post-Quantum Algorithms
Recently,post-quantum cryptographic algorithms have become a popular research topic in the field of security owing to their resistance to quantum attacks.The lattice-based Falcon digital signature algorithm is one of the first four post-quantum cryptographic standard algorithms published by NIST.Key tree generation is the core component of the Falcon algorithm,which requires more time and consumes more resources during actual operation.Therefore,this study proposes a GPU-based parallel key tree generation scheme for Falcon that uses the Single Instruction Multiple Threads(SIMT)parallel mode with joint control of parity threads and the direct computation mode without intermediate variables to achieve speedup and reduce resource consumption.Experiments are conducted on a Python-based CUDA platform to verify the accuracy of the results.Falcon key tree generation for the RTX 3060 Laptop has a latency of 6 ms and a throughput rate of 167 times/s.It achieves a 1.17 acceleration ratio relative to the CPU when computing a single Falcon key tree generating part,whereas the GPU achieves an approximately 56 acceleration ratio relative to the CPU when 1 024 Falcon key tree generating parts are generated simultaneously;the throughput rate is 32 times/s on the embedded Jetson Xavier NX platform.

post-quantum cryptographyFalcon algorithmGraphics Processing Unit(GPU)CUDA platformparallel computing

张磊、赵光岳、肖超恩、王建新

展开 >

北京电子科技学院电子与通信工程系,北京 100070

北京电子科技学院网络空间安全系,北京 100070

后量子密码 Falcon算法 图形处理器 CUDA平台 并行计算

国家重点研发计划中央高校基本科研业务费资金中央高校基本科研业务费资金中央高校基本科研业务费资金北京高等教育"本科教学改革创新项目"北京电子科技学院一流学科建设项目

2017YFB0801803328202278328202261328202300520211001800220210064Z0401

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(9)