In order to implement differentiated protection for railway data with different sensitivity levels,this paper proposed an intelligent recognition and classification and grading method for railway sensitive data based on hierarchical topic analysis,aimed to provide a basis for the grading protection of railway network data.This method utilized data semantics and classification and grading rules to establish a topic lexicon,and preliminarily determined the sensitivity level of data through topic analysis.Considering the uneven distribution of sensitivity levels in railway network data,the paper designed a graded probability vector weighted aggregation mechanism and used the cohesive hierarchical clustering algorithm to implement accurate grading.Through experimental verification,compared with traditional topic analysis methods based on semantics and K-means clustering,this method can effectively alleviate the problem of imbalanced distribution,implement fine-grained,dynamically adjustable intelligent recognition and accurate grading of railway sensitive data,and provide technical support for implementing the requirements of railway network data grading management and ensuring the security and controllability of railway network data.
关键词
自然语言处理/凝聚层次聚类/主题分析/铁路网络数据/敏感属性识别/数据分类分级
Key words
Natural Language Processing(NLP)/agglomerative hierarchical clustering/topic analysis/railway information data/sensitive attribute identification/data classification and grading