Design and Implementation of Algorithm for Clustering and Clustering Fusion Based on File Instructions in Malicious Software Detection
This paper describes that the traditional antivirus and antivirus methods based on the client as the battlefield can no longer keep up with security requirements in today's explosive growth of malicious software.how to automatically,quickly,and accurately identify,analyze,and process a large number of unknown files poses new requirements and challenges for data mining.This article focuses on the method of expressing malicious software features based on file instructions;A weighted subspace clustering method WKM is proposed for irrelevant instruction sequence features,which solves the difficulty of traditional clustering in finding submerged families in the full feature space.A hybrid clustering method PFHK is proposed for instruction frequency characteristics,which solves the shape distortion and uneven density of malicious software that cannot be handled solely by hierarchical or partitioning methods;And introduce the clustering fusion method CCE to fuse different clustering algorithms,and can also add user-defined constraints.Compared to other commonly used anti malware software,the number of viruses detected per day is 1.2 to 1.3 times that of them,and its performance is significantly better than that of commonly used anti malware software when it takes more than 30 seconds.
clusteringcluster fusionmalicious software classificationdocument instructionscommand frequency