首页|基于Transformer和CNN的恶意代码分类方法

基于Transformer和CNN的恶意代码分类方法

扫码查看
针对现有基于CNN的恶意代码分类方法存在训练成本高以及少数类分类准确率低的问题,结合CNN和Transformer的特点提出了基于改进MobileVit的恶意代码分类方法。首先,采用恶意代码可视化的样本预处理方法,加快模型收敛;然后,结合CNN和自注意力机制,提出了基于代价敏感性的MobileVit模型,通过改进Transformer encoder结构和加入Fo-cal Loss方法,降低模型的训练成本,在提高模型对恶意代码样本表征能力的同时,保证模型对少数类的关注。实验表明,在网络层数、参数数量明显减少的情况下,改进后的MobileVit模型在准确率上依然能保持优势,在微软恶意代码分类数据集上准确率最高达到98。88%,相比于未修改的模型,在精确率、召回率和F1分数上分别提高了1。7%、2。0%和2。1%。模型对大型恶意家族预测准确率保持在99%以上的同时,对小型恶意家族的准确率最高提高了17%。
Classification of malicious code based on transformer and CNN
Existing CNN-based malware classification methods suffer from high training costs and low accu-racy for minority classes.To overcome these limitations,this paper proposes an improved method based on improved MobileVit,which combines the characteristics of CNN and Transformer.Firstly,a malicious code visualization sample preprocessing method is adopted to accelerate model convergence.Then,combining CNN with a self-attention mechanism,a cost-sensitive MobileVit model is designed to improve the Trans-former encoder structure and introduce the Focal Loss method to reduce the training costs of the model.Mean-while,it enhances the ability to represent malicious code samples and ensures attention to minority classes.Experimental results demonstrate that the improved MobileVit model maintains an advantage in accuracy while significantly reducing the number of network layers and parameters.On the Microsoft malware classifi-cation dataset,the accuracy of the improved model can reach 98.88%,showing improvements of 1.7%,2.0%,and 2.1%in precision,recall,and F1 score respectively compared to the unmodified model.The model achieves over 99%accuracy for large malware families and up to 17%improvement for small malware families.

Malicious code classificationAttention mechanismData imbalanceMobileVit

牟雨萌、刘亮、张磊、苏莉媛

展开 >

四川大学网络空间安全学院,成都 610065

恶意代码分类 注意力机制 数据不平衡 MobileVit

四川省科技计划项目四川省科技计划项目

2021YFG01592022YFG0171

2024

四川大学学报(自然科学版)
四川大学

四川大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.358
ISSN:0490-6756
年,卷(期):2024.61(4)
  • 1