首页|基于层次化主题分析的铁路敏感数据智能识别与分类分级方法

基于层次化主题分析的铁路敏感数据智能识别与分类分级方法

扫码查看
为了对铁路不同敏感等级数据实施差异化保护,文章提出了一种基于层次化主题分析的铁路敏感数据智能识别与分类分级方法,旨在为铁路网络数据分级保护提供依据.该方法利用数据语义和分类分级规则建立主题词库,通过主题分析初步判断数据敏感级别.考虑到铁路网络数据的敏感级别分布不平衡,设计分级概率向量加权聚合机制,利用凝聚层次聚类算法实现准确定级.经实验验证,与基于语义和K-means聚类的传统主题分析方法相比,该方法可有效缓解分布不平衡问题,实现细粒度、动态可调整的铁路敏感数据智能识别与准确定级,从而为落实铁路网络数据分级管理要求、确保铁路网络数据安全可控提供技术支撑.
Intelligent identification and classification and grading method of railway sensitive data based on hierarchical topic analysis
In order to implement differentiated protection for railway data with different sensitivity levels,this paper proposed an intelligent recognition and classification and grading method for railway sensitive data based on hierarchical topic analysis,aimed to provide a basis for the grading protection of railway network data.This method utilized data semantics and classification and grading rules to establish a topic lexicon,and preliminarily determined the sensitivity level of data through topic analysis.Considering the uneven distribution of sensitivity levels in railway network data,the paper designed a graded probability vector weighted aggregation mechanism and used the cohesive hierarchical clustering algorithm to implement accurate grading.Through experimental verification,compared with traditional topic analysis methods based on semantics and K-means clustering,this method can effectively alleviate the problem of imbalanced distribution,implement fine-grained,dynamically adjustable intelligent recognition and accurate grading of railway sensitive data,and provide technical support for implementing the requirements of railway network data grading management and ensuring the security and controllability of railway network data.

Natural Language Processing(NLP)agglomerative hierarchical clusteringtopic analysisrailway information datasensitive attribute identificationdata classification and grading

江文彬、刘兆霖、谢仕康、傅一馨、李琪

展开 >

北京交通大学 网络安全学院,北京 100044

中国铁道科学研究院集团有限公司 电子计算技术研究所,北京 100081

北京经纬信息技术有限公司,北京 100081

自然语言处理 凝聚层次聚类 主题分析 铁路网络数据 敏感属性识别 数据分类分级

2024

铁路计算机应用
中国铁道科学研究, 中国铁道学会计算机委员会

铁路计算机应用

影响因子:0.267
ISSN:1005-8451
年,卷(期):2024.33(10)