Software Vulnerability Detection Based on Mixed Representation and Cooperative Training
In order to solve the problems of poor generalization ability of deep learning model and low performance of traditional rule engine-based vulnerability detection tools due to fewer benchmark datasets in vulnerability domain,a method of software source code vul-nerability detection based on mixed representation and cooperative training was proposed.Firstly,source code text features and code semantic information are extracted based on the pre-trained model.Then,tools are used to generate abstract syntax tree,and AST(Abstract syntax tree)features of source code are extracted by custom traversal rules,and the two features are mixed to enrich code repre-sentation.Secondly,multiple deep models are built,and the generalization ability of each model is improved through a large number of unlabeled data based on cooperative training algorithm.In view of the problem that a single model may cause high false positive rate and high false positive rate,and that one model may dominate the prediction results,a multi-model integration method based on weighted voting mechanism is adopted.The experimental results show that the proposed method can solve the problem of poor model generalization caused by fewer data sets to some extent.Compared with some mainstream detection methods in the field of vulnerability detection,the proposed method has certain advantages in various indicators,and the detection performance is higher than that of the rule engine Fortify.
deep learningmixed featurevulnerability detectioncooperative trainingensemble learning