基于SHAP解释工具的网络欺凌文本检测模型研究
Research on Cyberbullying Text Detection Model Based on SHAP Explanations Tool
刘冬 1刘瑞丽 1翁海光1
作者信息
- 1. 上海公安学院信息化与网络安全系,上海 200137
- 折叠
摘要
针对如何快速识别社交网络平台文本内容是否为欺凌文本的问题,提出了一种基于RoBERTa-BiGRU的网络欺凌文本检测模型.该模型首先使用预训练RoBERTa抽取文本的语义特征,并使用BiGRU进行特征综合提炼;然后将RoBERTa-BiGRU分类模型在网络欺凌文本检测数据集CB-tweets上的分类性能进行了相关评估;最后引入SHAP解释工具从全局和局部两个维度对模型所识别出的关键特征和基线值进行比较分析.实验结果表明,RoBERTa-BiGRU模型具有更高的分类准确率;使用可解释工具发现RoBERTa-BiGRU在Age、Ethnicity、Gender、Religion 4个类别上计算得到的关键词与该类别的标签主题相符,但在Other CB和Not CB类别上发现的关键词多为生僻字符和连写词,模型并未真正理解Other CB和Not CB的内在特征区别.
Abstract
Aiming to quickly identify whether text content in social media was cyberbullying text,a cy-berbullying text detection model based on RoBERTa-BiGRU was proposed.Firstly,the pretrained Ro-BERTa was used to extract semantic features of the text in the model,and BiGRU was utilized for com-prehensively feature extraction.Secondly,the classification performance of the RoBERTa-BiGRU classifi-cation model was evaluated on the Cyberbullying dataset CB-tweets.Finally,the SHAP interpretation tool was introduced to compare and analyze the key features and baseline values identified by RoBERTa-BiG-RU model from both global and local dimensions.Experimental results showed that RoBERTa-BiGRU model had higher classification accuracy.It was found that the keywords calculated by RoBERTa-BiGRU on Age,Ethnicity,Gender,and Religion categories matched the labels of that category by using inter-pretable tool.However,the keywords found on Other CB and Not CB categories were mostly rare charac-ters and ligatures,indicating that the model did not truly understand the inherent feature differences be-tween Other CB and Not CB categories.
关键词
Cyberbullying/SHAP/RoBERTa/BiGRU/文本检测Key words
Cyberbullying/SHAP/RoBERTa/BiGRU/text detection引用本文复制引用
出版年
2024