基于样本嵌入的挖矿恶意软件检测方法

Cryptocurrency Mining Malware Detection Method Based on Sample Embedding

傅建明 ¹姜宇谦 ¹何佳 ²郑锐 ³苏日古嘎 ¹彭国军¹

扫码查看

作者信息

1. 武汉大学国家网络安全学院空天信息安全与可信计算教育部重点实验室武汉 430072
2. 嵩山实验室技术中心郑州 450046
3. 河南大学计算机与信息工程学院河南开封 475000
折叠

摘要

加密货币挖矿恶意软件的高盈利性和匿名性,对计算机用户造成了巨大威胁和损失.为了对抗挖矿恶意软件带来的威胁,基于软件静态特征的机器学习检测器通常选取单一类型的静态特征,或者通过集成学习来融合不同种类静态特征的检测结果,忽略了不同种类静态特征之间的内在联系,其检测率有待提升.文章从挖矿恶意软件的内在层级联系出发,自下而上提取样本的基本块、控制流程图和函数调用图作为静态特征,训练三层模型以将这些特征分别嵌入向量化,并逐渐汇集从底层到高层的特征,最终输入分类器实现对挖矿恶意软件的检测.为了模拟真实环境中的检测情形,先在一个小的实验数据集上训练模型,再在另一个更大的数据集上测试模型的性能.实验结果表明,三层嵌入模型在挖矿恶意软件检测上的性能领先于近年提出的机器学习模型,在召回率和准确率上相比其他模型分别提高了 7％和3％以上.

Abstract

Due to its high profitability and anonymity,cryptocurrency mining malware poses a great threat and loss to computer users.In order to confront the threat posed by mining malware,machine learning detectors based on software static features usually select a single type of static features,or integrate the detection results of different kinds of static features through inte-grated learning,ignoring the internal relationship between different kinds of static features,and its detection rate remains to be discussed.This paper starts from the internal hierarchical relationship of mining malware.It extracts basic blocks,control flow graphs and function call graphs of samples as static features,trains the three-layer model to embed these features into the vector respectively,and gradually gathers the features from the bottom to the top,and finally sends top features to the classifier to detect mining malware.To simulate the detection situation in real world,it first trains the model on a relatively smaller experimental da-ta set,and then tests the performance of the model on another much larger data set.Experiment results show that the perfor-mance of th proposed method is much better than that of some machine learning models proposed in recent years.The recall rate and accuracy rate of three-layer-embedding model is more than 7％and 3％higher than that of other models,respectively.

关键词

挖矿恶意软件/静态分析/机器学习/图嵌入

Key words

Cryptocurrency mining malware/Static analysis/Machine learning/Graph embedding

引用本文复制引用

基金项目

国家自然科学基金(61972297)

国家自然科学基金(62172308)

国家自然科学基金(62272351)

国家重点研发计划(2021YFB3101201)

出版年

2024

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

参考文献量31

段落导航