首页|基于Jump-SBERT的二进制代码相似性检测技术研究

基于Jump-SBERT的二进制代码相似性检测技术研究

扫码查看
二进制代码相似性检测技术在不同的安全领域中有着重要的作用.针对现有的二进制代码相似性检测方法面临计算开销大且精度低、二进制函数语义信息识别不全面和评估数据集单一等问题,提出了 一种基于Jump-SBERT的二进制代码相似性检测技术.Jump-SBERT有两个主要创新点,一是利用孪生网络构建SBERT网络结构,该网络结构能够在降低模型的计算开销的同时保持计算精度不变;二是引入了跳转识别机制,使Jump-SBERT可以学习到二进制函数的图结构信息,从而更加全面地捕获二进制函数的语义信息.实验结果表明,Jump-SBERT在小函数池(32个函数)中的识别准确率可达96.3%,在大函数池(10000个函数)中的识别准确率可达85.1%,比最先进(State-of-the-Art,SOTA)的方法高出36.13%,且Jump-SBERT在大规模二进制代码相似性检测中的表现更加稳定.消融实验表明,两个主要创新点对Jump-SBERT均有积极作用,其中,跳转识别机制的贡献最高可达9.11%.
Study on Binary Code Similarity Detection Based on Jump-SBERT
Binary code similarity detection technology plays an important role in different security fields.Aiming at the problems of the existing binary code similarity detection methods,such as high computational cost and low accuracy,incomplete semantic information recognition of binary function and single evaluation data set,a binary code similarity detection technique based on Jump-SBERT is proposed.Jump-SBERT has two main innovations.One is to use twin networks to build SBERT network struc-ture,which can reduce the calculation cost of the model while keeping the calculation accuracy unchanged.The other is to intro-duce jump recognition mechanism,which enables Jump-SBERT to learn the graph structure information of binary functions.Thus,the semantic information of binary function can be captured more comprehensively.Experimental results show that the re-cognition accuracy of Jump-SBERT can reach 96.3%in the small function pool(32 functions)and 85.1%in the large function pool(10000 functions),which is 36.13%higher than state-of-the-art(SOTA)methods.Jump-SBERT is more stable in large-scale binary code similarity detection.Ablation experiments show that both of the two main innovation points have positive effects on Jump-SBERT,and the contribution of jump recognition mechanism is up to 9.11%.

Binary codeSimilarity detectionSemantic informationSBERT network structureJump recognition mechanism

严尹彤、于璐、王泰彦、李宇薇、潘祖烈

展开 >

国防科技大学电子对抗学院 合肥 230037

网络空间安全态势感知与评估安徽省重点实验室 合肥 230037

二进制代码 相似性检测 语义信息 SBERT网络结构 跳转识别机制

国家自然科学基金青年科学基金

62202484

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(5)
  • 42