非结构化高维大数据异常流量时间点挖掘算法
Mining Algorithm for Abnormal Traffic Time Points for Unstructured High-Dimensional Big Data
解海燕 1李杰 1赵国栋2
作者信息
- 1. 银川能源学院,宁夏 银川 750100
- 2. 宁夏大学网络与信息管理中心,宁夏 银川 750021
- 折叠
摘要
非结构化数据的维度较高,每个样本数据包含的特征非常多,导致了维度灾难问题,使得降低维度并保持有效特征提取难度较大,影响大数据流量异常时间点挖掘的精度.为此,提出新的基于空间映射的非结构化高维大数据流量异常时间点挖掘方法.通过近似解集的几何特征建立稀疏回归模型,求解高维目标空间映射到低维目标子空间的稀疏投影矩阵.根据密度分布选择出一个高密度集合作为聚类中心的候选集,确定聚类的初始聚类中心.同时对聚类形成的各个簇采用剪枝算法,选择时间点候选集,对候选集展开二次判断,挖掘高维大数据流量异常时间点.实验结果表明,数据的降维能有效提高流量异常挖掘精度.相比之下,所提方法的高维大数据流量异常时间点挖掘更加精准,耗时更短.
Abstract
Generally,unstructured data has a high dimension.Each sample contains a large number of features,leading to dimensionality reduction,so it is difficult to maintain effective feature extraction.Therefore,a new method for mining abnormal time points in high-dimensional unstructured big data traffic based on spatial mapping was put forward.First of all,a sparse regression model was built by using the geometric characteristics of the approximate so-lution set.And then,the sparse projection matrix mapping from high-dimensional space to low-dimensional subspace was solved.Moreover,based on the density distribution,a high-density set was selected as the candidate set of the clustering center,thus determining the initial clustering center for clustering.Meanwhile,a pruning algorithm was ap-plied to all the clusters.Furthermore,a candidate set of time points was selected.After that,a secondary judgment was performed on the candidate set.Finally,the abnormal time points in high-dimensional big data traffic were mined suc-cessfully.Experimental results prove that dimensionality reduction of data can effectively improve the mining accuracy of abnormal traffic.In comparison,the proposed method is more accurate and time-efficient in mining abnormal time points of high-dimensional big data traffic.
关键词
非结构化数据/高维大数据/流量/异常时间点/挖掘方法Key words
Non-structural/High-dimensional big data/Rate of flow/Abnormal time point/Mining method引用本文复制引用
基金项目
银川能源学院科研项目(2023-KY-Z-3)
宁夏自然科学基金(2021aac03118)
出版年
2024