Design of Classification and Retrieval Algorithms for Abnormal Data in Big Data Networks
Effectively discovering abnormal data is of great significance for protecting the security of big data networks.To ad-dress the issues of low accuracy and high time consumption in conventional classification and retrieval algorithms,a classification and retrieval algorithm for abnormal data in big data networks is designed.By calculating the impact degree,select the characteristics of network big data,including source IP information entropy,destination port information entropy,ingress/egress ratio,unilateral con-nection density,data flow duration,TCP total,packet length,and average idle time,and implement standardization processing.A feature vector composed of data features is used to describe network data samples.Utilize improved density peak clustering algorithm to classify network big data samples.Based on similarity,a retrieval model is constructed,and the similarity between the reference sample of abnormal data and each category is calculated.The cluster corresponding to the maximum similarity is regarded as the ab-normal cluster,thus completing the retrieval of abnormal data.The results indicate that the CH index of the studied classification re-trieval method is better,the Jaccard coefficient is larger,and the total time cost of classification retrieval is lower.This indicates that the classification retrieval method has stronger classification retrieval ability and can complete more accurate abnormal data retrieval in a shorter time.
big data networkabnormal databig data featuresimproved density peak clustering algorithmsimilarity retrieval modelclassification retrieval algorithm