传统的静态检测恶意JavaScript代码方法十分依赖于已有的恶意代码特征,无法有效提取混淆恶意代码特征,导致检测混淆恶意JavaScript代码的精确率低。针对该问题提出基于双向长短期记忆网络(Bidirectional Long Short-term Memory,Bi-LSTM)的恶意代码检测模型。通过抽象语法树将JavaScript代码转化为句法单元序列,通过Doc2Vec算法将句法单元序列用分布式向量表示,将句向量矩阵送入Bi-LSTM模型进行检测。实验结果表明,该方法对于混淆恶意JavaScript代码具有良好的检测效果且检测效率高,准确率为97。03%,召回率为97。10%。
MALICIOUS JAVASCRIPT CODE DETECTION METHOD BASED ON BI-LSTM MODEL
The traditional static detection methods of malicious JavaScript code rely heavily on existing malicious code features,which can't effectively extract the obfuscated malicious code feature,resulting in low accuracy of detecting obfuscated malicious JavaScript code.To solve this problem,a malicious code detection model based on bidirectional long short-term memory(Bi-LSTM)is proposed.This method transformed JavaScript code into syntactic unit sequence through abstract syntax tree,and used the Doc2Vec algorithm to represent the syntactic unit sequence with distributed vectors.The sentence vector matrix was sent to the Bi-LSTM model for detection.The experimental results show that this method has good detection effect and high detection efficiency for obfuscated malicious JavaScript code,with the accuracy rate of 97.03%and the recall rate of 97.10%.