计算机研究与发展2024,Vol.61Issue(8) :2097-2109.DOI:10.7544/issn1000-1239.202220531

面向SIMD指令集的SM4算法比特切片优化

Bitsliced Optimization of SM4 Algorithm with the SIMD Instruction Set

王闯 丁滟 黄辰林 宋连涛
计算机研究与发展2024,Vol.61Issue(8) :2097-2109.DOI:10.7544/issn1000-1239.202220531

面向SIMD指令集的SM4算法比特切片优化

Bitsliced Optimization of SM4 Algorithm with the SIMD Instruction Set

王闯 1丁滟 1黄辰林 1宋连涛1
扫码查看

作者信息

  • 1. 国防科技大学计算机学院 长沙 410073
  • 折叠

摘要

SM4算法是中国自主设计的商用分组密码算法,其加解密计算性能成为影响信息系统数据机密性保障的重要因素之一.现有SM4算法优化主要面向硬件设计和软件查表等方向展开研究,分别存在依赖特定硬件环境、效率低下且易遭受侧信道攻击等问题.比特切片技术通过对输入数据重组实现了并行化高效分组密码处理,可以抵御针对缓存的侧信道攻击.然而现有切片分组密码研究对硬件平台相关性强、处理器架构支持单一,并且并行化处理流水启动较慢,面向小规模数据的加解密操作难以充分发挥单指令多数据(single instruction multiple data,SIMD)等先进指令集的优势.针对上述问题,首先提出了一种跨平台的通用切片分组密码算法模型,支持面向不同的处理器指令字长提供一致化的通用数据切片方法.在此基础上,提出了一种面向SIMD指令集的细粒度切片并行处理SM4优化算法,通过细粒度明文切片重组与线性处理优化有效缩短算法启动时间.实验结果表明,相比通用SM4算法,优化的SM4比特切片算法加密速率最高可达 438.0 MBps,加密每字节所需的时钟周期最快高达 7.0 CPB(cycle/B),加密性能平均提升80.4%~430.3%.

Abstract

SM4 algorithm is a commercial block cipher algorithm independently designed by China,and its encryption and decryption performance has become one of the critical factors affecting the data confidentiality of the information system.The existing optimizations mainly focus on hardware designs and software look-up tables,which have problems such as dependence on specific hardware environments,low efficiency,and vulnerability to side-channel attacks.Bit slicing technology efficiently processes block ciphers in parallel by reorganizing input data,and can resist side-channel attacks against caches.However,the existing researches on bitsliced block ciphers are highly dependent on the hardware platforms and only support a single processor architecture,and the parallel processing pipeline starts slowly.It is difficult for the encryption and decryption operations for small-scale data to give full play to the advantages of advanced instruction sets such as SIMD(single instruction multiple data)instructions.To resolve the above problems,we firstly propose a cross-platform general bitsliced block cipher algorithm model,which supports a general data slicing method that provides consistent data slicing for different processor instructions.Based on that,a fine-grained bitsliced SM4 optimization algorithm for SIMD instructions is proposed,which can effectively shorten the startup time of the algorithm through fine-grained plaintext slicing reorganization and linear transformation optimization.The experiments show that,compared with the look-up table-based SM4 algorithm,the encryption rate can reach up to 438.0 MBps.The clock cycles required for encrypting a byte are up to 7.0 CPB(cycle/B),and the encryption performance is improved by an average of 80.4%to 430.3%.

关键词

SM4算法/性能优化/比特切片/侧信道攻击/SIMD指令集

Key words

SM4 algorithm/performance optimization/bit slice/side-channel attacks/SIMD instruction set

引用本文复制引用

基金项目

国家自然科学基金联合基金项目(U19A2060)

国家自然科学基金项目(62172431)

基础加强计划重点研究基础研究项目(2019-XXX-ZD-188-00)

湖南省研究生科研创新项目(CX20220056)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCD北大核心
影响因子:2.649
ISSN:1000-1239
段落导航相关论文