首页|基于双向胶囊网络的恶意评论检测

基于双向胶囊网络的恶意评论检测

扫码查看
为了解决现有检测模型无法准确识别语言风格多变、语意隐晦的恶意评论问题,提出了一种基于双向胶囊网络的恶意评论检测模型.首先,利用BERT模型对评论文本进行词嵌入,创建输入矩阵;其次,将输入矩阵传递给双向特征提取层,该层由堆叠的LSTM、双向胶囊网络和注意力网络组成,从正向和反向同时捕获文本的深层语义信息,将生成的正向和反向矩阵拼接起来并输入到注意力机制中,聚焦与恶意评论相关的词语并生成输出向量;再次,拼接输出向量与语境辅助特征向量,丰富特征表示;最后,将拼接向量输入到全连接层中,通过Sigmoid激活函数对评论文本进行分类.在维基百科恶意评论数据集上进行的实验表明,相较于现有研究,基于双向胶囊网络的恶意评论检测模型性能提升显著,能够捕获评论文本中更丰富的语义信息,有效检测恶意评论.
Toxic comments detection based on bidirectional capsule network
To address the issue that existing detection models struggle to accurately identify mali-cious comments with varied linguistic styles and implicit semantics,a malicious comment detection mod-el based on a bidirectional capsule network is proposed.Firstly,the BERT model is utilized to perform word embedding on comment texts,creating an input matrix.This input matrix is then passed to a bidi-rectional feature extraction layer,which comprises stacked LSTM,bidirectional capsule networks,and attention networks.This layer captures the deep semantic information of the text simultaneously from both forward and backward directions.The generated forward and backward matrices are concatenated and input into an attention mechanism,which focuses on words related to malicious comments and gen-erates an output vector.Secondly,the output vector is concatenated with a context-assisted feature vec-tor to enrich the feature representation.Finally,the concatenated vector is input into a fully connected layer,and the comment text is classified through the Sigmoid activation function.Experiments conducted on the Wikipedia malicious comment dataset demonstrate that compared to existing research,the malicious comment detection model based on the bidirectional capsule network achieves significant performance improvements.It is capable of capturing richer semantic information in comment texts and effectively detecting malicious comments.

BERT language modelbidirectional capsule networkcontextual auxiliary featurestoxic comments detection

李公瑾、邵玉斌、杜庆治、龙华、马迪南

展开 >

昆明理工大学信息工程与自动化学院,云南 昆明 650504

云南省媒体融合重点实验室,云南 昆明 650032

BERT语言模型 双向胶囊网络 语境辅助特征 恶意评论检测

云南省媒体融合重点实验室项目

320225403

2024

计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
年,卷(期):2024.46(10)