软件2024,Vol.45Issue(3) :51-53.DOI:10.3969/j.issn.1003-6970.2024.03.012

面向中文文本的敏感信息识别方法研究

Research on Methods for Sensitive Information Detection in Chinese Text

董思源 王子扬 章坤 孙美凤
软件2024,Vol.45Issue(3) :51-53.DOI:10.3969/j.issn.1003-6970.2024.03.012

面向中文文本的敏感信息识别方法研究

Research on Methods for Sensitive Information Detection in Chinese Text

董思源 1王子扬 2章坤 1孙美凤1
扫码查看

作者信息

  • 1. 扬州大学广陵学院,江苏扬州 225000
  • 2. 扬州市宝扬数码科技公司,江苏扬州 225000
  • 折叠

摘要

为了避免互联网上不良敏感信息的泛滥,创建干净、文明的用网环境,本文研究中文文本的敏感信息识别问题.基于调研分析,提出由敏感词库构建、可疑文本发现和敏感信息识别三阶段组成的识别框架,并给出每阶段的执行策略和方法.对基于Word2vec的敏感词库扩充方法进行了实验,结果表明该方法具有显著效果.

Abstract

To prevent the proliferation of inappropriate and sensitive information on the internet and to create a clean and civilized online environment,this article investigates the issue of sensitive information detection in Chinese text.Based on survey analysis,a detection framework composed of three stages-the construction of a sensitive word library,the discovery of suspicious text,and sensitive information detection-is proposed,along with strategies and methods for each stage.Experiments were conducted on a method of expanding the sensitive word library based on Word2vec,and the results showed that this method had significant effects.

关键词

Word2vec/敏感信息识别/中文文本

Key words

Word2vec/sensitive information detection/Chinese text

引用本文复制引用

基金项目

2023年江苏省大学生创新创业训练计划(202313987020Y)

出版年

2024
软件
中国电子学会 天津电子学会

软件

影响因子:1.51
ISSN:1003-6970
参考文献量9
段落导航相关论文