首页|一种基于部分数据的多级剪枝Obfs4混淆流量识别方法

一种基于部分数据的多级剪枝Obfs4混淆流量识别方法

扫码查看
Obfs4混淆流量是匿名通信网络Tor的一种承载流量,因其强匿名的特性而被滥用于非法网络活动,因此识别Obfs4混淆流量对预防利用Tor网络进行的网络犯罪具有重要作用.现有识别策略往往侧重于分析Obfs4流量特征,将完整流样本利用机器学习或深度学习技术进行精细化识别,但处于在线流识别的应用场景下时间开销偏高,且识别准确度在O b fs 4应用间隔到达时间反检测技术(Inter-arrival Timing,I AT)后有所下降.为此,提出了一种基于部分数据的多级剪枝Obfs4混淆流量识别方法,仅收集每个流最先到达的少量数据包进行多轮快速过滤,并重点针对IAT模式特性设计识别方法,提升了 Obfs4流量识别的效率和鲁棒性.该方法将识别过程分为握手阶段和加密通信阶段.在握手阶段,充分挖掘Obfs4握手数据包的隐含语义,进行随机性、时序和长度分布特征的粗粒度快速剪枝;在加密通信阶段,先对每个流的前若干数据包进行特征提取,并提高IAT相关特征的权重,最后利用XGBoost分类方法进行细粒度识别.实验结果表明,在包括了应用IAT技术的混淆流量的数据集上,使用流的前30~50个数据包能达到99%的正确率和精确度,平均每条流的处理时间在毫秒级.
Multi-level Pruning Obfs4 Obfuscated Traffic Recognition Method Based on Partial Data
Obfs4 obfuscated traffic,carried by the anonymous communication network Tor,is often misused for illicit online acti-vities due to its strong anonymity.Consequently,the identification of Obfs4 obfuscated traffic plays a critical role in preventing cybercrime via the Tor network.Existing methods tend to focus on the analysis of Obfs4 traffic features,utilize machine learning or deep learning techniques for the precise identification of entire flow samples.However,in the realm of flow recognition,it often results in considerable time overhead.Recognition accuracy also decreases notably with the incorporation of inter-arrival timing(IAT)technology in Obfs4.In response,a multi-level pruning method for Obfs4 obfuscated traffic recognition based on partial data is proposed.This approach involves collecting only a small number of initial packets from each flow for several rounds of rapid filtering,and is specifically designed to enhance the efficiency and reliability of Obfs4 traffic identification by focusing on the IAT pattern.The approach breaks down the process into two key phases:a handshake phase and an encrypted communication phase.During the handshake phase,it thoroughly explores the underlying meanings in Obfs4 handshake packets,enabling quick filtering based on broad characteristics like randomness,timing,and length distribution.In the encrypted communication phase,it extracts features from the first packets of each flow and places greater importance on features related to IAT.Finally,fine-grained identification is accomplished using the XGBoost classification method.Experimental findings indicate that despite the implemen-tation of IAT technology,leveraging the initial 30~50 data packets from the flow yields a 99%accuracy rate,with an average processing time per flow measured in milliseconds.

Obfs4Obfuscated traffic recognitionMulti-level pruningInter-arrival time reverse detectionXGBoost

徐宸涵、黄河、孙玉娥、杜扬

展开 >

苏州大学计算机科学与技术学院 江苏苏州 215006

苏州大学轨道交通学院 江苏苏州 215131

Obfs4 混淆流量识别 多级剪枝 间隔到达时间反检测 极致梯度提升

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金

6233201362072322U20A2018262202322

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(4)
  • 25