Effectively discovering abnormal data is of great significance for protecting the security of big data networks.To ad-dress the issues of low accuracy and high time consumption in conventional classification and retrieval algorithms,a classification and retrieval algorithm for abnormal data in big data networks is designed.By calculating the impact degree,select the characteristics of network big data,including source IP information entropy,destination port information entropy,ingress/egress ratio,unilateral con-nection density,data flow duration,TCP total,packet length,and average idle time,and implement standardization processing.A feature vector composed of data features is used to describe network data samples.Utilize improved density peak clustering algorithm to classify network big data samples.Based on similarity,a retrieval model is constructed,and the similarity between the reference sample of abnormal data and each category is calculated.The cluster corresponding to the maximum similarity is regarded as the ab-normal cluster,thus completing the retrieval of abnormal data.The results indicate that the CH index of the studied classification re-trieval method is better,the Jaccard coefficient is larger,and the total time cost of classification retrieval is lower.This indicates that the classification retrieval method has stronger classification retrieval ability and can complete more accurate abnormal data retrieval in a shorter time.
关键词
大数据网络/异常数据/大数据特征/改进密度峰值聚类算法/相似度检索模型/分类检索算法
Key words
big data network/abnormal data/big data features/improved density peak clustering algorithm/similarity retrieval model/classification retrieval algorithm