基于Transformer和CNN的恶意代码分类方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：针对现有基于CNN的恶意代码分类方法存在训练成本高以及少数类分类准确率低的问题,结合CNN和Transformer的特点提出了基于改进MobileVit的恶意代码分类方法.首先,采用恶意代码可视化的样本预处理方法,加快模型收敛;然后,结合CNN和自注意力机制,提出了基于代价敏感性的MobileVit模型,通过改进Transformer encoder结构和加入Fo-cal Loss方法,降低模型的训练成本,在提高模型对恶意代码样本表征能力的同时,保证模型对少数类的关注.实验表明,在网络层数、参数数量明显减少的情况下,改进后的MobileVit模型在准确率上依然能保持优势,在微软恶意代码分类数据集上准确率最高达到98.88%,相比于未修改的模型,在精确率、召回率和F1分数上分别提高了1.7%、2.0%和2.1%.模型对大型恶意家族预测准确率保持在99%以上的同时,对小型恶意家族的准确率最高提高了17%.

外文标题：Classification of malicious code based on transformer and CNN

外文摘要：Existing CNN-based malware classification methods suffer from high training costs and low accu-racy for minority classes.To overcome these limitations,this paper proposes an improved method based on improved MobileVit,which combines the characteristics of CNN and Transformer.Firstly,a malicious code visualization sample preprocessing method is adopted to accelerate model convergence.Then,combining CNN with a self-attention mechanism,a cost-sensitive MobileVit model is designed to improve the Trans-former encoder structure and introduce the Focal Loss method to reduce the training costs of the model.Mean-while,it enhances the ability to represent malicious code samples and ensures attention to minority classes.Experimental results demonstrate that the improved MobileVit model maintains an advantage in accuracy while significantly reducing the number of network layers and parameters.On the Microsoft malware classifi-cation dataset,the accuracy of the improved model can reach 98.88%,showing improvements of 1.7%,2.0%,and 2.1%in precision,recall,and F1 score respectively compared to the unmodified model.The model achieves over 99%accuracy for large malware families and up to 17%improvement for small malware families.

外文关键词：

Malicious code classificationAttention mechanismData imbalanceMobileVit

作者：

牟雨萌、刘亮、张磊、苏莉媛

展开 >

作者单位：

四川大学网络空间安全学院,成都 610065

关键词：

恶意代码分类注意力机制数据不平衡 MobileVit

基金：

四川省科技计划项目四川省科技计划项目

项目编号：

2021YFG01592022YFG0171

出版年：

2024

DOI：

10.19907/j.0490-6756.2024.042004

四川大学学报(自然科学版)

四川大学

四川大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.358

ISSN：0490-6756

年,卷(期)：2024.61(4)

参考文献量1