非结构化高维大数据异常流量时间点挖掘算法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：非结构化数据的维度较高，每个样本数据包含的特征非常多，导致了维度灾难问题，使得降低维度并保持有效特征提取难度较大，影响大数据流量异常时间点挖掘的精度。为此，提出新的基于空间映射的非结构化高维大数据流量异常时间点挖掘方法。通过近似解集的几何特征建立稀疏回归模型，求解高维目标空间映射到低维目标子空间的稀疏投影矩阵。根据密度分布选择出一个高密度集合作为聚类中心的候选集，确定聚类的初始聚类中心。同时对聚类形成的各个簇采用剪枝算法，选择时间点候选集，对候选集展开二次判断，挖掘高维大数据流量异常时间点。实验结果表明，数据的降维能有效提高流量异常挖掘精度。相比之下，所提方法的高维大数据流量异常时间点挖掘更加精准，耗时更短。

外文标题：Mining Algorithm for Abnormal Traffic Time Points for Unstructured High-Dimensional Big Data

外文摘要：Generally,unstructured data has a high dimension.Each sample contains a large number of features,leading to dimensionality reduction,so it is difficult to maintain effective feature extraction.Therefore,a new method for mining abnormal time points in high-dimensional unstructured big data traffic based on spatial mapping was put forward.First of all,a sparse regression model was built by using the geometric characteristics of the approximate so-lution set.And then,the sparse projection matrix mapping from high-dimensional space to low-dimensional subspace was solved.Moreover,based on the density distribution,a high-density set was selected as the candidate set of the clustering center,thus determining the initial clustering center for clustering.Meanwhile,a pruning algorithm was ap-plied to all the clusters.Furthermore,a candidate set of time points was selected.After that,a secondary judgment was performed on the candidate set.Finally,the abnormal time points in high-dimensional big data traffic were mined suc-cessfully.Experimental results prove that dimensionality reduction of data can effectively improve the mining accuracy of abnormal traffic.In comparison,the proposed method is more accurate and time-efficient in mining abnormal time points of high-dimensional big data traffic.

外文关键词：

Non-structuralHigh-dimensional big dataRate of flowAbnormal time pointMining method

作者：

解海燕、李杰、赵国栋

展开 >

作者单位：

银川能源学院,宁夏银川 750100

宁夏大学网络与信息管理中心,宁夏银川 750021

关键词：

非结构化数据高维大数据流量异常时间点挖掘方法

基金：

银川能源学院科研项目宁夏自然科学基金

项目编号：

2023-KY-Z-32021aac03118

出版年：

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

年,卷(期)：2024.41(7)