A MALICIOUS FILE DETECTION METHOD BASED ON"KEY WORDS"WEIGHTED LDA MODEL
Malicious files often contain feature codes that appear less frequently but have better characterization capabilities.Traditional methods have failed to extract this type of feature.In response to this problem,a malicious file detection method based on word weighted LDA model is proposed.The method preprocessed the samples through disassembly,and extracted"key words"by improved KeyGraph algorithm(IKG).This kind of words had better characteristic representation abilities.The optimized point mutual information(OPMI)was used to calculate the weight of each"key word",established a word dictionary.This word weighting method was extended to the LDA model,and the IKG-OPMI-LDA(IOL)model was built to complete the classification.Gibbs Sampling was adopted for parameter estimation.The experimental results show that,compared with other methods,the classification accuracy of this method is significantly improved,the classification efficiency is better,and the extracted features have a higher degree of discrimination and a higher degree of correlation with the topic.