基于中间语言的PHP注入漏洞检测方法研究

Research on PHP Injection Vulnerability Detection Method Based on Intermediate Language

张国栋 ¹刘子龙 ¹姚天宇 ¹靳卓 ¹孙东红 ²秦佳伟³

扫码查看

作者信息

1. 沈阳航空航天大学计算机学院沈阳中国 110136
2. 清华大学网络科学与网络空间研究院北京中国 100084
3. 国家计算机网络应急技术处理协调中心北京中国 100029
折叠

摘要

Web应用数量快速增长并广泛用于各领域,所存在的漏洞数量也随之增长.注入漏洞是Web应用漏洞中最具广泛性和破坏性的,漏洞检测工具所提取的信息中会缺失部分与漏洞相关的语义信息,且包含大量与漏洞信息无关的噪声数据,导致误报和漏报.针对此问题,提出了一种命名为Alpherg的中间语言表示,具有保留源代码信息、提取源代码中仅与漏洞相关的语义信息和表示源代码控制流信息等特点.利用其进行漏洞特征提取时,表示结果丢弃了与漏洞无关的噪声数据,保留了源代码中的上下文信息,形式上可脱离原有的编程语言,具有可读性.利用Alpherg进行漏洞特征提取,提出了基于Bi-LSTM和注意力机制的PHP注入漏洞检测模型,利用Bi-LSTM得到Alpherg长序列表示中的上下文关系;进一步,通过加入注意力机制计算每个时间步的注意力分布,更好地利用Alpherg中与漏洞相关的信息,提高了模型的漏洞检测能力.将Alpherg与其他特征提取方法处理结果进行了比较,结果表明Alpherg能精确地提取与漏洞存在直接关系的信息,避免引入过多噪声,并保留了漏洞的语义信息.在SARD数据集上验证了所提出的漏洞检测模型,漏洞检测结果表明该模型漏洞检测准确率为 98%,高于作为对比的三个静态检测工具和基于PHP token的深度学习漏洞检测模型,证明了此方法的可行性和有效性.

Abstract

With the rapid growth of Web applications and use in various fields,the number of vulnerabilities in Web ap-plications has increased.Injection vulnerabilities are the most widespread and destructive in Web application vulnerabili-ties.However,the information extracted by vulnerability detection tools will miss semantic information related to vulner-ability and contain lots of noise data unrelated to vulnerability,which leads to false positives and false negatives.To solve this problem,an intermediate language representation named Alpherg is proposed,which can retain the code information,extract the semantic information only related to the vulnerability,and represent the control flow information in the source code.Using Alpherg to extract vulnerability features,the results discard the noise data unrelated to vulnerability,retain the context information in the source code,and the form can be separated from the original programming language.Using Alpherg,a PHP injection vulnerability detection model based on Bi-LSTM and attention mechanism is proposed.The model uses Bi-LSTM to obtain the context relationship in Alpherg's long sequence representation.Furthermore,attention mechanism is added to the model to utilize the information related to vulnerabilities in the Alpherg representation by cal-culating the attention distribution at each time step and improving the vulnerability detection ability.Compared Alpherg with other methods,the results show that it can accurately extract information related to vulnerability directly,avoid noise and retain the semantic information of vulnerability.The proposed model is verified on the SARD dataset.The results show that the vulnerability detection accuracy of the proposed model is 98%,which is higher than the three static detection tools and the PHP Token-based deep learning vulnerability detection model,which proves the feasibility and effectiveness of this method.

关键词

注入漏洞检测/深度学习/漏洞语义特征/代码切片

Key words

injection vulnerability detection/deep learning/semantic features of vulnerabilities/code slicing

引用本文复制引用

出版年

2024

信息安全学报

CSTPCDCSCD

ISSN：

段落导航