基于混合表征和协同训练的软件漏洞检测

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：对于漏洞领域基准数据集较少导致的深度学习模型泛化能力较差,以及传统的基于规则引擎的漏洞检测工具性能较低的问题,提出了一种基于混合表征和协同训练的软件源代码漏洞检测方法.首先,基于预训练模型提取源代码文本特征,提取代码语义信息,然后使用工具生成抽象语法树,通过自定义遍历规则提取源代码的AST(抽象语法树)特征,将两种特征进行混合丰富代码表征.其次,搭建多个深度模型,基于协同训练算法通过大量的无标签数据提升各模型的泛化能力.鉴于单一模型可能造成较高的漏报率和误报率,并可能被某一模型主导预测结果的问题,采用了基于加权投票机制的多模型集成方法.实验结果表明,该方法在一定程度上解决了数据集较少导致的模型泛化性差的问题,与漏洞检测领域一些主流检测方法相比,该方法在各指标上具有一定的优势,且检测性能高于规则引擎Fortify.

外文标题：Software Vulnerability Detection Based on Mixed Representation and Cooperative Training

外文摘要：In order to solve the problems of poor generalization ability of deep learning model and low performance of traditional rule engine-based vulnerability detection tools due to fewer benchmark datasets in vulnerability domain,a method of software source code vul-nerability detection based on mixed representation and cooperative training was proposed.Firstly,source code text features and code semantic information are extracted based on the pre-trained model.Then,tools are used to generate abstract syntax tree,and AST(Abstract syntax tree)features of source code are extracted by custom traversal rules,and the two features are mixed to enrich code repre-sentation.Secondly,multiple deep models are built,and the generalization ability of each model is improved through a large number of unlabeled data based on cooperative training algorithm.In view of the problem that a single model may cause high false positive rate and high false positive rate,and that one model may dominate the prediction results,a multi-model integration method based on weighted voting mechanism is adopted.The experimental results show that the proposed method can solve the problem of poor model generalization caused by fewer data sets to some extent.Compared with some mainstream detection methods in the field of vulnerability detection,the proposed method has certain advantages in various indicators,and the detection performance is higher than that of the rule engine Fortify.

外文关键词：

deep learningmixed featurevulnerability detectioncooperative trainingensemble learning

作者：

陈浩东、李琳、乔梦晴、叶彪

展开 >

作者单位：

武汉科技大学计算机科学与技术学院,湖北武汉 430065

智能信息处理与实时工业系统湖北省重点实验室,湖北武汉 430065

关键词：

深度学习混合表征漏洞检测协同训练集成学习

基金：

武汉市重点研发计划资助项目湖北省教育厅资助项目湖北省大学生创新创业训练计划项目

项目编号：

20220122020150702020354S202110488047

出版年：

2024

DOI：

10.20165/j.cnki.ISSN1673-629X.2024.0050

计算机技术与发展

陕西省计算机学会

计算机技术与发展

CSTPCD

影响因子：0.621

ISSN：1673-629X

年,卷(期)：2024.34(5)

参考文献量20