基于角度边界的二进制函数对比学习模型
An Angular-Margin-Based Binary Function Contrastive Learning Framework
孙瑞锦 1郭世泽 2黎维 3詹达之 1王军 4潘志松1
作者信息
- 1. 陆军工程大学指挥控制工程学院,江苏南京 210007
- 2. 国家计算机网络与信息安全管理中心,北京 100029
- 3. 陆军装甲兵学院,北京 100072
- 4. 31306部队,四川成都 610036
- 折叠
摘要
现有代码相似性检测模型主要关注编码器的构建,对深度学习的损失函数研究较少.针对二进制函数嵌入向量评估被忽略的问题,提出了一种基于角度边界的二进制代码对比学习模型(angular margin-based binary code contrastive learning framework,AngCLF).通过优化对比学习的目标函数,提高了模型的准确性并加快了收敛速度.分析了模型产生效果的原因,并引入多个评估二进制代码向量空间的指标.通过实验验证了 AngCLF的准确性,发现其在准确性上超越了包括j Trans模型在内的6个模型,并且收敛速度更快,对齐度和均匀性等指标也有明显优势.
Abstract
Existing code similarity detection models primarily focus on constructing encoders,with lim-ited research on loss functions in deep learning.To address the overlooked issue of evaluating embedded binary function vectors,this paper proposes an angular-margin-based binary code contrastive learning framework(AngCLF).By optimizing the objective function of contrastive learning,the model's accuracy and convergence speed are enhanced.Besides,the study analyzes the reasons for the model's effectiveness and introduces multiple metrics for evaluating binary code vector spaces.The experimental results validate the accuracy of the AngCLF.The AngCLF surpasses six models including the jTrans model in accuracy,and has faster convergence speed and obvious advantages in alignment and uniformity metrics.
关键词
对比学习/角度边界/嵌入学习/二进制代码相似性检测Key words
contrastive learning/angular margin/embedding learning/binary code similarity detec-tion(BCSD)引用本文复制引用
出版年
2024