首页|基于混合表征和协同训练的软件漏洞检测

基于混合表征和协同训练的软件漏洞检测

扫码查看
对于漏洞领域基准数据集较少导致的深度学习模型泛化能力较差,以及传统的基于规则引擎的漏洞检测工具性能较低的问题,提出了一种基于混合表征和协同训练的软件源代码漏洞检测方法。首先,基于预训练模型提取源代码文本特征,提取代码语义信息,然后使用工具生成抽象语法树,通过自定义遍历规则提取源代码的AST(抽象语法树)特征,将两种特征进行混合丰富代码表征。其次,搭建多个深度模型,基于协同训练算法通过大量的无标签数据提升各模型的泛化能力。鉴于单一模型可能造成较高的漏报率和误报率,并可能被某一模型主导预测结果的问题,采用了基于加权投票机制的多模型集成方法。实验结果表明,该方法在一定程度上解决了数据集较少导致的模型泛化性差的问题,与漏洞检测领域一些主流检测方法相比,该方法在各指标上具有一定的优势,且检测性能高于规则引擎Fortify。
Software Vulnerability Detection Based on Mixed Representation and Cooperative Training
In order to solve the problems of poor generalization ability of deep learning model and low performance of traditional rule engine-based vulnerability detection tools due to fewer benchmark datasets in vulnerability domain,a method of software source code vul-nerability detection based on mixed representation and cooperative training was proposed.Firstly,source code text features and code semantic information are extracted based on the pre-trained model.Then,tools are used to generate abstract syntax tree,and AST(Abstract syntax tree)features of source code are extracted by custom traversal rules,and the two features are mixed to enrich code repre-sentation.Secondly,multiple deep models are built,and the generalization ability of each model is improved through a large number of unlabeled data based on cooperative training algorithm.In view of the problem that a single model may cause high false positive rate and high false positive rate,and that one model may dominate the prediction results,a multi-model integration method based on weighted voting mechanism is adopted.The experimental results show that the proposed method can solve the problem of poor model generalization caused by fewer data sets to some extent.Compared with some mainstream detection methods in the field of vulnerability detection,the proposed method has certain advantages in various indicators,and the detection performance is higher than that of the rule engine Fortify.

deep learningmixed featurevulnerability detectioncooperative trainingensemble learning

陈浩东、李琳、乔梦晴、叶彪

展开 >

武汉科技大学 计算机科学与技术学院,湖北 武汉 430065

智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065

深度学习 混合表征 漏洞检测 协同训练 集成学习

武汉市重点研发计划资助项目湖北省教育厅资助项目湖北省大学生创新创业训练计划项目

20220122020150702020354S202110488047

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(5)
  • 20