计算机应用与软件2024,Vol.41Issue(3) :313-320.DOI:10.3969/j.issn.1000-386x.2024.03.048

一种基于词加权LDA模型的恶意文件检测方法

A MALICIOUS FILE DETECTION METHOD BASED ON"KEY WORDS"WEIGHTED LDA MODEL

徐建国 王旭阳
计算机应用与软件2024,Vol.41Issue(3) :313-320.DOI:10.3969/j.issn.1000-386x.2024.03.048

一种基于词加权LDA模型的恶意文件检测方法

A MALICIOUS FILE DETECTION METHOD BASED ON"KEY WORDS"WEIGHTED LDA MODEL

徐建国 1王旭阳1
扫码查看

作者信息

  • 1. 山东科技大学计算机科学与工程学院 山东青岛 266590
  • 折叠

摘要

恶意文件中往往含有出现频率较低、但表征能力更好的特征码,传统的方法未能将这一类特征提取出来.针对该问题,提出一种基于词加权LDA模型的恶意文件检测方法,该方法通过反汇编对样本进行预处理,采用改进的KeyGraph算法(IKG)提取"重点词",这类词具有更好的特征表征能力,再利用优化的点互信息(OPMI),算出各"重点词"权重,构建词字典,然后将该词加权方法扩展到LDA模型,建立IKG-OPMI-LDA(IOL)模型完成分类,并采用Gibbs Sampling进行参数估计.实验结果表明,相较于其他方法,该方法的分类准确率有明显提高,分类效率更好,并且提取的特征具有更高的区分度,与主题相关度更高.

Abstract

Malicious files often contain feature codes that appear less frequently but have better characterization capabilities.Traditional methods have failed to extract this type of feature.In response to this problem,a malicious file detection method based on word weighted LDA model is proposed.The method preprocessed the samples through disassembly,and extracted"key words"by improved KeyGraph algorithm(IKG).This kind of words had better characteristic representation abilities.The optimized point mutual information(OPMI)was used to calculate the weight of each"key word",established a word dictionary.This word weighting method was extended to the LDA model,and the IKG-OPMI-LDA(IOL)model was built to complete the classification.Gibbs Sampling was adopted for parameter estimation.The experimental results show that,compared with other methods,the classification accuracy of this method is significantly improved,the classification efficiency is better,and the extracted features have a higher degree of discrimination and a higher degree of correlation with the topic.

关键词

恶意文件/LDA/IKG/加权模型/文档分类

Key words

Malicious files/LDA/IKG/Weighted model/Document classification

引用本文复制引用

基金项目

青岛市哲学社会科学规划项目(2016)(QDSKL1601121)

山东省高等学校人文社会科学研究计划思想政治教育专题研究项目(2017)(J17ZZ27)

山东科技大学研究生科技创新项目(2018)(SDKDYC180339)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
参考文献量21
段落导航相关论文