首页|基于抽象语法树特征迁移的软件漏洞检测方法(AST-FMVD)

基于抽象语法树特征迁移的软件漏洞检测方法(AST-FMVD)

扫码查看
深度学习在漏洞检测的应用取得了显著的进展。现有漏洞检测算法需要大量的标记数据,通过有监督的方法构建检测模型,在多语言环境中,由于语言的多样性和标记训练样本的缺乏,检测模型可能存在泛化性问题,特别是在小样本领域中可能表现不佳。为了解决这一困境,迁移学习可以作为一种解决方案,迁移学习的核心思想是以"举一反三"为核心的算法框架,将某个领域的知识迁移到另一个领域的学习中,从而打破样本数据的制约。该文提出了一种基于特征迁移的漏洞检测方法。通过语义相似性对代码的语法树节点信息进行聚类,可以快速并准确地构建好不同语言之间的节点映射关系,同时在语法树的映射过程中引入上下文感知技术帮助解决歧义或模糊的语法结构,提高解析性能。该方法实现检测样本从未知领域变换到已知领域,利用在原有领域构建的深度学习模型,可以将新领域任务迁移到已知领域,最终解决跨域的知识迁移的应用,并将该方法取名为AST-FMVD。最后通过Java的漏洞检测模型对含有特定漏洞文件的进行检测,实现模型在Python领域中的迁移应用,证明了AST-FMVD的可行性,并通过实验证明AST-FMVD可以实现源域中的训练模型在目标领域仍可以保证原模型良好的检测水平。
Software Vulnerability Detection Method Based on Abstract Syntax Tree Feature Migration(AST-FMVD)
Deep learning has made significant progress in vulnerability detection.Existing vulnerability detection algorithms require a large amount of labeled data and build detection models through supervised methods.In a multi-language environment,due to the diversity of languages and the lack of labeled training samples,the detection model may have generalization problems,especially in the field of small samples,where performance may be poor.To solve this dilemma,transfer learning can serve as a solution.The core idea of transfer learning is the"learning by analogy"algorithm framework,transferring knowledge from one domain to another,thereby breaking the constraints of sample data.We propose a feature-based transfer vulnerability detection method.By clustering the syntax tree node in-formation of the code through semantic similarity,the node mapping relationship between different languages can be quickly and accurately constructed.At the same time,context-aware technology is introduced in the syntax tree mapping process to help solve ambiguous or vague grammatical structures,improving parsing performance.The proposed method enables the detection samples to transform from unknown domains to known ones,and utilizing the deep learning model built in the original domain,the new domain task can be transferred to the known domain,ultimately solving the application of cross-domain knowledge transfer.It is named AST-FMVD.Finally,we use the Java vulnerability detection model to detect files containing specific vulnerabilities,realizing the model's transfer application in the Python domain,proving the feasibility of AST-FMVD,and experimentally demonstrating that AST-FMVD can ensure the original model's good detection level in the target domain.

deep learningtransfer learningzero-shotvulnerability detectionabstract syntax tree

李子俊、李涛、陈浩东、余琴、乔梦晴、李琳、王颉、万振华、宋荆汉

展开 >

武汉科技大学 计算机科学与技术学院,湖北 武汉 430065

智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065

深圳开源互联网安全技术有限公司,广东 深圳 518000

深度学习 迁移学习 零样本 漏洞检测 抽象语法树

武汉市重点研发计划武汉科技大学研究生教学改革研究项目

2022012202015070Yjg202111

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(6)
  • 25