基于多字节频率域可视化和深度学习的恶意软件检测
Malware detection based on multi-byte frequency domain visualization and deep learning
孙世淼 1刘亚姝 1严寒冰2
作者信息
- 1. 北京建筑大学电气与信息工程学院,北京 102616
- 2. 国家计算机网络应急技术处理协调中心运行部,北京 100029
- 折叠
摘要
随着恶意软件数量和种类的增长,恶意软件可视化研究在提高检测效率上遇到了瓶颈.为提高准确率,从频率域角度,提出一种基于改进的多阶马尔可夫概率的恶意软件可视化方法.在恶意软件可视化过程中充分考虑相邻字节之间的关联性和不同长度汇编指令的字节分布等问题,根据指令长度计算不同阶的马尔可夫概率,获取多阶马尔可夫图像以扩展样本量.融合深度学习构建IM-CNN(image of muti-order Malkov-CNN)检测框架,进行分类检测,其结果表明,IM-CNN在CNCERT和BIG2015数据集上的准确率最高均可达99%,受恶意软件数据集的平衡性因素影响较小.
Abstract
With the increase in the number and types of malwares,the research on malware visualization has encountered a bottle-neck in improving the detection efficiency.To improve the accuracy,from the perspective of frequency domain,a malware visua-lization method based on improved multi-order Markov probability was proposed.The correlation between adjacent bytes and the byte distribution of assembly instructions with different lengths were fully considered in the process of malware visualization.The Markov probabilities of different orders were calculated according to the instruction length,and the multi-order Markov ima-ges were obtained to expand the sample size.The IM-CNN(image of muti-order Malkov-CNN)detection framework was con-structed by integrating deep learning for malware detection.The results show that the accuracy of IM-CNN on both CNCERT and BIG2015 datasets can reach 99%,and IM-CNN is less affected by the balance factor of malware dataset.
关键词
网络安全/恶意软件/可视化/马尔可夫/深度学习/卷积神经网络/分类检测Key words
cybersecurity/malware/visualization/Markov/deep learning/CNN/classification detection引用本文复制引用
基金项目
国家重点研发计划基金项目(2018YFB0803604)
出版年
2024