面向代码重用检测的程序语义分析模型

Program semantic analysis model for code reuse detection

郭曦 ¹王盼²

扫码查看

作者信息

1. 华中农业大学信息学院,湖北武汉 430070
2. 湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室,湖北武汉 430068
折叠

摘要

程序相似性分析在代码缺陷检测、产权保护等领域有广泛的用途,但普遍存在计算开销过大等问题,为此提出了一种基于模糊匹配和统计推理的代码相似度分析方法.针对二进制程序,首先对其进行反汇编分析,然后进行函数边界识别操作,从而提取函数的执行边界信息.在此基础上,在基本块的粒度上使用动态规划的分析方法获得基本块之间的相似度结果,并在控制流图的基础上进行邻域搜索,从而将相似性分析从基本块级别扩展至函数级别.最后通过相似度函数的统计分析得出二进制文件的语义相似度.在该过程中对预训练模型进行优化分析,并对参数进行调优,从而可以对跨平台代码进行相似度分析.实验结果表明,相对于目前主流的分析工具,所提方法在分析精度方面较传统的分析工具有较大提高,其分析精度平均提高7.1%.

Abstract

Program similarity analysis had a wide range of applications in areas such as code plagiarism and property pro-tection,but it generally suffered from problems such as excessive computational overhead,a code similarity analysis method based on fuzzy matching and statistical inference was proposed.For binary programs,first disassembly analysis was performed and then function boundary recognition operations was performed to extract the execution boundary in-formation of the function.On this basis,dynamic programming analysis methods were used to obtain similarity results between basic blocks at the granularity of the basic blocks,and neighborhood search was performed on the basis of the control flow graph to extend similarity analysis from the basic block level to the function level.Finally,the semantic similarity of binary files was obtained through statistical analysis of similarity functions.During this process,the pre trained model was optimized and analyzed,and the parameters were tuned to enable similarity analysis of cross platform code.The experimental results show that the proposed method has a significant improvement in analysis accuracy com-pared to traditional analysis tools,with an average increase of 7.1%in analysis accuracy compared to current mainstream analysis tools.

关键词

程序分析/模糊匹配/统计推理/机器学习

Key words

program analysis/fuzzy matching/statistical inference/machine learning

引用本文复制引用

出版年

2024

通信学报

中国通信学会

通信学报

CSTPCDCSCD北大核心

影响因子：1.265

ISSN：1000-436X

段落导航