计算机工程与设计2024,Vol.45Issue(2) :356-366.DOI:10.16208/j.issn1000-7024.2024.02.005

基于图神经网络与深度学习的PDF文档检测模型

PDF document detection model based on graph neural network and deep learning

雷靖玮 伊鹏 陈祥
计算机工程与设计2024,Vol.45Issue(2) :356-366.DOI:10.16208/j.issn1000-7024.2024.02.005

基于图神经网络与深度学习的PDF文档检测模型

PDF document detection model based on graph neural network and deep learning

雷靖玮 1伊鹏 1陈祥1
扫码查看

作者信息

  • 1. 信息工程大学信息技术研究所,河南郑州 450002
  • 折叠

摘要

针对传统PDF文档检测误报率过高的问题,提出一种基于图神经网络与深度学习的检测模型DGNN.通过收集文档运行时各线程产生的系统调用数据生成相应的系统调用图,运用所提基于H指数的图采样策略缩减数据规模;采样后的子图作为模型DGNN的输入,借助图卷积网络提取关联关系的同时,利用深度学习提取系统调用对的属性特征并完成特征融合,通过系统调用图的性质判别完成检测.实验结果表明,与其它方法相比,该模型特征提取与训练时间短,有效提高了 PDF文档的检测效果.

Abstract

Focused on the issues that the traditional detection methods cannot cope with malicious PDF documents effectively and always result in false positives,a detection model based on graph neural network and deep learning(DGNN)was introduced.The tracking tool captured the system calls once opening a document,and system call graphs were constructed,accompanied by the division according to the threads.Simultaneously,a method of graph sampling based on the H-index was proposed for down-scaling.The sampled subgraphs were used as the input of the model.Subsequently,the association relations were extracted through the graph convolution network,and the attribute features were extracted using deep learning for fusion.The final detec-tion was completed according to the nature of system call graphs.Experimental results show that,compared with other methods,the proposed model has outstanding performances in feature extracting and training,effectively improving the accuracy of PDF detection.

关键词

PDF文档检测/图神经网络/深度学习/图采样/特征分析/性能评价/系统调用

Key words

PDF document detection/graph neural network/deep learning/graph sampling/feature analysis/performance eva-luation/system call

引用本文复制引用

基金项目

国家重点研发计划基金项目(2020YFB1806402)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量26
段落导航相关论文