摘要
由一名新闻记者-机器人与机器学习的工作人员新闻编辑每日新闻-调查人员讨论机器学习的新发现。根据NewsRx记者在北京的新闻报道,研究表明:“过去十年,机器学习越来越多地应用于各种任务,包括网络异常检测。但基于单一机器学习算法的异常检测方法通常不能取得良好的效果,因为网络流量具有复杂多变的模式。”本研究的资金支持方包括中国国家重点研发项目、清华大学-中国电信新一代互联网技术联合研究所。新闻记者引用清华大学的一篇研究,“因此,基于集成学习的方法已经被提出了很多解决这个问题的方法,但以往的研究主要存在的缺点是忽略了弱分类器之间的相似性,从而降低了检测性能,而且现有的研究大多采用离线的方法来监督D算法。”摘要:本文提出了一种基于集成学习的在线、无监督和相似性感知网络异常检测算法ADSim,并给出了一种基于集成学习的相似性感知网络异常检测算法。ADSim的目标可以直观地描述为在训练阶段识别出相似的弱分类器,并将它们作为一个整体来处理,为此,ADSim首先在训练阶段创建一个距离矩阵来记录各类分类器之间的相似性,然后在检测阶段采用层次聚类的方法对相似的弱分类器进行分组。在MAWILab和CIC-IDS-2017两个数据集上对ADSim进行了测试。
Abstract
By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Investigators discuss new findings in Machine Learning. According to news reporting originating in Beijing, People's R epublic of China, by NewsRx journalists, research stated, "The last decade has s een the increasing application of machine learning to various tasks, including n etwork anomaly detection. But anomaly detection methods based on a single machin e learning algorithm usually fail to achieve good results, since network traffic have complex and changeable patterns." Financial supporters for this research include National Key R&D Pro gram of China, Tsinghua University-China Telecom Joint Research Institute for Ne xt Generation Internet Technology. The news reporters obtained a quote from the research from Tsinghua University, "Therefore, many solutions based on ensemble learning have been proposed to addr ess this problem. However, most previous studies have the main drawback that the y overlook the similarity between the weak classifiers, which may degrade the de tection performance. What is more, most existing works use offline and supervise d algorithms, which means a large number of computing resources and reliable lab els are necessary during the training period. In this paper, we propose ADSim , an online, unsupervised, and similarity -aware network anomaly detection algorit hm based on ensemble learning. For a similarity -aware scheme, the target of ADS im can be intuitively described as recognizing the similar weak classifiers duri ng the training phase and treat them as a whole. To achieve this, ADSim first in crementally maintains a distance matrix to record the similarity between the cla ssifiers in the training phase and uses Hierarchy Clustering to group the simila r classifiers. In the detecting phase, each cluster will be assigned a weight de pending on the consistency of the detection results of the classifiers within it . Moreover, the working procedure of ADSim is online and unsupervised, which sig nificantly improves its practicality. We test ADSim on two datasets, MAWILab and CIC-IDS-2017."