信息安全学报2024,Vol.9Issue(3) :138-156.DOI:10.19363/J.cnki.cn10-1380/tn.2022.12.04

面向C++商业软件二进制代码中的类信息恢复技术

Class Information Recovery Technology for COTS C++Binary

杨晋 龚晓锐 吴炜 张伯伦
信息安全学报2024,Vol.9Issue(3) :138-156.DOI:10.19363/J.cnki.cn10-1380/tn.2022.12.04

面向C++商业软件二进制代码中的类信息恢复技术

Class Information Recovery Technology for COTS C++Binary

杨晋 1龚晓锐 1吴炜 1张伯伦1
扫码查看

作者信息

  • 1. 中国科学院信息工程研究所 北京 中国 100093;中国科学院大学 网络空间安全学院 北京 中国 100049
  • 折叠

摘要

采用 C++编写的软件一直是二进制逆向分析中的高难度挑战,二进制代码中不再保留 C++中的类及其继承信息,尤其是正式发布的软件缺省开启编译优化,导致残留的信息也被大幅削减,使得商业软件(Commercial-Off-The-Shelf,COTS)的 C++二进制逆向分析尤其困难.当前已有的研究工作一是没有充分考虑编译优化,导致编译优化后类及其继承关系的识别率很低,难以识别虚继承等复杂的类间关系;二是识别算法执行效率低,无法满足大型软件的逆向分析.本文围绕编译优化下的 C++二进制代码中类及其继承关系的识别技术开展研究,在三个方面做出了改进.第一,利用过程间静态污点分析从 C++二进制文件中提取对象的内存布局,有效抵抗编译优化的影响(构造函数内联);第二,引入了四种启发式方法,可从编译优化后的C++二进制文件中恢复丢失的信息;第三,研发了一种自适应CFG(控制流图)生成算法,在极小损失的情况下大幅度提高分析的效率.在此基础上实现了一个原型系统RECLASSIFY,它可以从C++二进制代码中有效识别多态类和类继承关系(包括虚继承).实验表明,在MSVC ABI和Itanium ABI下,RECLASSIFY均能在较短时间内从优化后二进制文件中识别出大多数多态类、恢复类关系.在由15个真实软件中的C++二进制文件组成的数据集中(O2编译优化),RECLASSIFY在MSVC ABI下恢复多态类的平均召回率为 84.36%,而之前最先进的解决方案 OOAnalyzer 恢复多态类的平均召回率仅为 33.76%.除此之外,与OOAnalyzer相比,RECLASSIFY的分析效率提高了三个数量级.

Abstract

Software written in C++has always been a difficult challenge in binary reverse analysis.Binary code no longer retains the classes and their information in C++,especially Commercial-Off-The-Shelf(COTS)enables compiler optimi-zation by default,resulting in significant reduction of residual information.It makes COTS C++binary reverse analysis particularly difficult.At present,the existing research work does not fully consider compilation optimization,resulting in a low recognition rate on recovering classes and class relationships under compiler optimization,and it is difficult to iden-tify complex relationships such as virtual inheritance.Second,the recognition algorithm has low efficiency and cannot meet the reverse analysis of large-scale software.This paper conducts research on the identification technology of classes and their inheritance in C++binary under compiler optimization,and makes achievements in three aspects.First,using the inter-procedural static taint analysis to extract the object memory layout from the C++binary,effectively resisting the impact of compiler optimization(inline constructors);second,introducing four heuristic methods,which can recover lost information in C++binary files;third,an adaptive CFG(control flow graph)generation algorithm has been developed to greatly improve the efficiency with mini-mal loss.On this basis,a prototype system RECLASSIFY is implemented,which can effectively identify polymorphic classes and class relationships(including virtual inheritance)from C++binary.Experiments show that under both MSVC ABI and Itanium ABI,RECLASSIFY can identify most polymorphic class and recovery class relationships from the optimized binary in a short time.In a data set composed of 15 C++binaries in real software(O2 compiler optimization),the average recall rate of RECLASSIFY recovering polymorphic classes under MSVC ABI is 84.36%,while the average recall rate of most advanced solution OOAnalyzer is only 33.76%.In addition,compared with OOAnalyzer,the analysis efficiency of RECLASSIFY is improved by three orders of magnitude.

关键词

二进制分析/类继承关系恢复/静态污点分析/自适应CFG生成算法

Key words

binary analysis/class inheritance recovery/static taint analysis/adaptive CFG generation algorithm

引用本文复制引用

基金项目

北京市科技计划网络空间攻防特殊技能人才培养及支撑平台建设项目(Z181100002718002)

出版年

2024
信息安全学报

信息安全学报

CSTPCDCSCD
ISSN:
参考文献量38
段落导航相关论文