大数据网络中异常数据的分类检索算法设计

Design of Classification and Retrieval Algorithms for Abnormal Data in Big Data Networks

张文聪¹

扫码查看

作者信息

1. 广东省城市技师学院,广州 510520
折叠

摘要

有效发现异常数据对于保护大数据网络安全具有重要意义.针对常规分类检索算法准确性不高、时间开销大的问题,设计一种大数据网络中异常数据的分类检索算法.通过计算影响程度,选取网络大数据特征,包括源IP信息熵、目的端口信息熵、出入度比值、单边连接密度、数据流持续时间、TCP总量、包长度、空闲时间平均值,并实施标准化处理.由数据特征构成特征向量,用于描述网络数据样本.利用改进密度峰值聚类算法对网络大数据样本分类.基于相似度构建检索模型,利用计算异常数据参考样本与每个类别之间的相似度,将相似度最大值对应的簇作为异常簇,由此完成了对异常数据的检索.结果表明:所研究分类检索方法的CH指标更好、Jaccard系数更大以及分类检索总时间开销更少,由此说明所研究分类检索方法的分类检索能力更强,能在更短的时间内完成更为准确的异常数据检索.

Abstract

Effectively discovering abnormal data is of great significance for protecting the security of big data networks.To ad-dress the issues of low accuracy and high time consumption in conventional classification and retrieval algorithms,a classification and retrieval algorithm for abnormal data in big data networks is designed.By calculating the impact degree,select the characteristics of network big data,including source IP information entropy,destination port information entropy,ingress/egress ratio,unilateral con-nection density,data flow duration,TCP total,packet length,and average idle time,and implement standardization processing.A feature vector composed of data features is used to describe network data samples.Utilize improved density peak clustering algorithm to classify network big data samples.Based on similarity,a retrieval model is constructed,and the similarity between the reference sample of abnormal data and each category is calculated.The cluster corresponding to the maximum similarity is regarded as the ab-normal cluster,thus completing the retrieval of abnormal data.The results indicate that the CH index of the studied classification re-trieval method is better,the Jaccard coefficient is larger,and the total time cost of classification retrieval is lower.This indicates that the classification retrieval method has stronger classification retrieval ability and can complete more accurate abnormal data retrieval in a shorter time.

关键词

大数据网络/异常数据/大数据特征/改进密度峰值聚类算法/相似度检索模型/分类检索算法

Key words

big data network/abnormal data/big data features/improved density peak clustering algorithm/similarity retrieval model/classification retrieval algorithm

引用本文复制引用

基金项目

世界银行贷款职业教育发展(广东)项目(7720-CN)

出版年

2024

自动化与仪器仪表

重庆工业自动化仪表研究所,重庆市自动化与仪器仪表学会

自动化与仪器仪表

CSTPCD

影响因子：0.327

ISSN：1001-9227

段落导航