PDF Malicious Indicators Extraction Technique Based on Improved Symbolic Execution
The malicious PDF document is a common attack method used by APT organizations.Analyzing extracted indicators of embedded JavaScript code is an important means to determine the maliciousness of the documents.However,attackers can adopt high obfuscation,sandbox detection and other escape methods to interfere with analysis.Therefore,this paper innovatively applies symbolic execution method to PDF indicator extraction.We propose a PDF malicious indicator extraction technique based on im-proved symbolic execution and implement SYMBPDF,an indicator extraction system consisting of three modules:code parsing,symbolic execution and indicator extraction.In the code parsing module,we implement extraction and reorganization of inline Javascript code.In the symbolic execution module,we design the code rewriting method to force branch shifting,resulting in im-proving the code coverage of symbolic execution.We also design a concurrency strategy and two constraint solving optimization methods to improve the efficiency.In the indicator extraction module,we realize integration and recording of malicious indicators.In this paper,1 271 malicious samples are extracted and evaluated.The success rate of indicator extraction is 92.2%,the indicator effectiveness is 91.7%,the code coverage is 8.5%higher and the system performance is 32.3%higher than that of before opti-mization.