With the rapid growth of Web applications and use in various fields,the number of vulnerabilities in Web ap-plications has increased.Injection vulnerabilities are the most widespread and destructive in Web application vulnerabili-ties.However,the information extracted by vulnerability detection tools will miss semantic information related to vulner-ability and contain lots of noise data unrelated to vulnerability,which leads to false positives and false negatives.To solve this problem,an intermediate language representation named Alpherg is proposed,which can retain the code information,extract the semantic information only related to the vulnerability,and represent the control flow information in the source code.Using Alpherg to extract vulnerability features,the results discard the noise data unrelated to vulnerability,retain the context information in the source code,and the form can be separated from the original programming language.Using Alpherg,a PHP injection vulnerability detection model based on Bi-LSTM and attention mechanism is proposed.The model uses Bi-LSTM to obtain the context relationship in Alpherg's long sequence representation.Furthermore,attention mechanism is added to the model to utilize the information related to vulnerabilities in the Alpherg representation by cal-culating the attention distribution at each time step and improving the vulnerability detection ability.Compared Alpherg with other methods,the results show that it can accurately extract information related to vulnerability directly,avoid noise and retain the semantic information of vulnerability.The proposed model is verified on the SARD dataset.The results show that the vulnerability detection accuracy of the proposed model is 98%,which is higher than the three static detection tools and the PHP Token-based deep learning vulnerability detection model,which proves the feasibility and effectiveness of this method.
关键词
注入漏洞检测/深度学习/漏洞语义特征/代码切片
Key words
injection vulnerability detection/deep learning/semantic features of vulnerabilities/code slicing