Application of ACFG Based on N-gram Improved Features in GCC Compiler Version Identification
The article explores an improved ACFG based on N-gram features,combined with an optimized LightGBM classifier,to achieve precise identification of GCC compiler versions.The research focuses on the extraction of key features and the construction of discriminant functions.In identifying the key features of compilation results,an N-gram association model was constructed to correlate statistical features of registers and opcodes,ensuring that local features within code blocks are fully preserved.Furthermore,on the basis of the improved ACFG framework,the aggregated graph features associated with N-grams effectively capture the contextual information between instruction sequence code blocks.During the construction of the discriminant function,experiments verified the significant advantage of the LightGBM classifier in handling complex features and employed Bayesian algorithms for hyperparameter optimization.The article concludes with suggestions for further enhancing model performance through strategies such as optimizing with Generative Adversarial Networks(GANs).
GCC Compiler Version IdentificationN-gramACFGLightGBM