首页|基于多教师知识蒸馏的多语种仇恨言论识别

基于多教师知识蒸馏的多语种仇恨言论识别

扫码查看
网络社交媒体仇恨言论识别,是开源情报领域一项重要工作,针对多语种文本模型识别性能不佳、预训练模型依赖大量计算资源的问题,提出一种多教师知识蒸馏方案.首先利用多个大语言模型获取概率分布矩阵,然后依据综合后的通用相关性权重与语种优势权重生成综合软标签以指导学生模型训练.实验结果表明,经此知识蒸馏的学生模型,能够在保留各教师模型语种优势的同时大幅缩短计算时间,节约计算资源.
Multi-language Hate Speech Recognition Based on Multi-teacher Knowledge Distillation
Hate speech recognition on social media is a critical task in the field of open-source intelligence.To address the poor recognition performance of multilingual text models and the high computational resource require-ments of pre-trained models,we propose a multi-teacher knowledge distillation scheme.First,several large language models are used to obtain probability distribution matrices.Then,comprehensive soft labels are generated based on integrated general relevance weights and language-specific advantage weights to guide the student model training.Experimental results show that the student model distilled in this way can significantly reduce computation time and save computational resources while inheriting the language-specific advantages of each teacher model.

hate speech recognitionmultilingual textknowledge distillationlarge language models

周子凡、李志

展开 >

中国人民警察大学研究生院,河北 廊坊 065000

中国人民警察大学智慧警务学院,河北 廊坊 065000

仇恨言论识别 多语种文本 知识蒸馏 大语言模型

2024

中国人民警察大学学报
中国人民武装警察部队学院

中国人民警察大学学报

影响因子:0.378
ISSN:2097-0900
年,卷(期):2024.40(10)