首页|基于Bi-LSTM神经网络的短文本敏感词识别方法

基于Bi-LSTM神经网络的短文本敏感词识别方法

扫码查看
为了准确识别与处理敏感词,针对分词时延较高、识别精度较低的问题,提出基于双向长短期记忆(Bi-LSTM)神经网络的短文本敏感词识别方法.分析敏感词库,将敏感词库划分为两大类、三个等级,预处理短文本干扰信息(特殊字符、繁体字与拆分汉字),引入Bi-LSTM神经网络构建短文本分词模型,二次训练确定最佳参数,反复计算词语的敏感性数值,通过敏感性对比函数,提取短文本敏感词,并匹配敏感词库,确定敏感词的类别与等级,实现短文本敏感词识别.实验结果表明:在不同实验组别下,应用本文方法获得的短文本分词时延均低于给定最大限值,短文本敏感词识别精度高于84.42%,应用性能较佳.
A Short Text Sensitive Word Recognition Method Based on Bi-LSTM Neural Network
In order to accurately identify and process sensitive words,a short text sensitive word recognition method based on bidirectional long short term memory(Bi-LSTM)neural network was proposed to address the issues of high segmentation delay and low recognition accuracy.By analyzing the sensitive lexicon,the sensitive lexicon was divided into two categories and three levels,and the short text interference information(special characters,traditional characters and split Chinese characters)was preprocessed.The Bi-LSTM neural network was introduced to construct a short text segmentation model.The optimal parame-ters were determined by secondary training,and the sensitivity values of words were calculated repeatedly.Through the sensitivity comparison function,the short text sensitive words were extracted,and the sensitive lexicon was matched to determine the catego-ry and level of sensitive words,so as to realize the recognition of short text sensitive words.The experimental results showed that in different experimental groups,the short text segmentation delay obtained by applying the method proposed in this paper is lower than the given maximum limit,and the recognition accuracy of sensitive words in short text is higher than 84.42%,indicating better application performance.

short textsensitive word recognitiontext filteringedit distancebidirectional long short-term memory neu-ral network

周军芽、吴进伟、吴广飞、张何为

展开 >

国网浙江省电力有限公司丽水供电公司,浙江 丽水 323000

短文本 敏感词识别 文本过滤 编辑距离 双向长短期记忆神经网络

2024

武汉理工大学学报(信息与管理工程版)
武汉理工大学

武汉理工大学学报(信息与管理工程版)

CSTPCD
影响因子:0.37
ISSN:2095-3852
年,卷(期):2024.46(2)
  • 15