Data identification is a prerequisite for achieving precise data governance,effectively ensuring the security of data elements during cross-domain transfer.Currently,there are methods for generating identifiers for individual data,but as the scale of data continues to expand,identifiers at the data level cannot be directly applied to the dataset level.This also introduces issues of identifiers being"easily damaged"and"difficult to embed".To effectively address these issues,we adopt the design concept of network honeypoint from the"guardian"model proposed by academician Fang Binxing.Utilizing the idea of deception defense,we propose an anti-damage data identification technology based on dataset honeypoint for cross-domain data transfer scenarios,and design a complete method for generating and embedding dataset honeypoints.First,for cross-domain data transfer scenarios,dataset honeypoints are designed.By enhancing the concealment of dataset honeypoints and increasing their redundancy,the issue of identifiers being"easily damaged"is addressed.Second,by ensuring that the form of dataset honeypoint is indistinguishable from real data,the issue of identifiers being"difficult to embed"is resolved.Finally,experiments conducted on both image and encrypted text data modalities demonstrate that dataset honeypoints possess high anti-damage capability,high robustness,and low performance overhead.
关键词
数据跨域流转/数据标识/数据集蜜点/欺骗防御/抗损毁/嵌入
Key words
data cross-domain transfer/data identification/dataset honeypoint/deception defense/damage resistance/embedding