Malware variant detection method based on heterogeneous graph attribute enhancement
Nowadays,an increasing number of attackers have been circumventing malware detection by modi-fying the source code of malicious software.The complex relationships among malware variants in code re-use,coding style,attack behavior and other aspects pose significant challenges to malware analysis.In recent years,graph neural networks have been widely applied to the tasks of malware classification and detection due to their powerful capabilities in modeling graph-structured data and learning complex relationships be-tween entities.This approach has enabled the modeling of complex relationships between malware and its variants,overcoming the limitations of isolated analysis.However,existing methods,on the one hand,lack a comprehensive characterization of the multi-dimensional complex relationships among malware and its vari-ants,leading to the underutilization of these complex interrelations.On the other hand,they focus only on the topological structure of malware,ignoring the semantic information of entities,allowing attackers to eas-ily forge features through adversarial methods and thus evade detection.In addition,the deficiency of seman-tic information in entities such as Windows API and communication IPs further hinders the extraction and rep-resentation of semantic information.Therefore,achieving the integration of the comprehensive relationships and the feature semantic information is crucial for enhancing the robustness and accuracy of malware variant detection.Accordingly,the authors propose a malware variant detection method,which is enhanced by the attributes of the heterogeneous graph.Specifically,the authors construct a heterogeneous information net-work to capture the complex relationship between malware and its features.Utilizing this network,the mal-ware variant detection is transformed into a node classification problem in a heterogeneous graph.Then,the authors formulate semantic attributes for the entity nodes to enhance the representation of node information.For entity nodes where semantic information is sparse,the authors derive the semantic information of the enti-ties from external open-source data to address their semantic deficiency.Finally,guided by topological rela-tionships,the authors utilize an attention mechanism to aggregate information from nodes with attributes to compensate for those without attributes,achieving attribute completion.Following an iterative optimization approach,the authors alternately optimize the completion process and the heterogeneous graph node embed-ding process,formulating a unified method for malware variant detection that leverages attribute completion in heterogeneous graph.Experimental results show that our proposed method significantly enhances the per-formance of malware variant detection,outperforming other state-of-the-art models across multiple datasets.