多层次结构与半监督学习的谣言检测研究
Research on rumor detection based on multilevel structure and semi supervised learning
张岩珂 1但志平 1董方敏 1高准 1张洪志1
作者信息
- 1. 三峡大学水电工程智能视觉监测湖北省重点实验室 宜昌 443002;三峡大学计算机与信息学院 宜昌 443002
- 折叠
摘要
当前谣言检测工作主要基于监督学习,需要人为标记数据而导致检测具有滞后性.为了充分利用大量的未标记数据,及时检测社交网络中的虚假谣言.提出了一种基于多层次结构与半监督学习谣言检测模型(multi-level semi spuervised graph convolutional neural network,MSGCN).该模型构建了一种多层次检测模块,基于图卷积网络对有限的标记样本进行训练以提取多层次传播结构特征、扩散结构特征和全局结构特征.其次,引入随机模型扰动集成无标签数据的动态输出进行一致性预测,提出互补伪标签法来获取高质量伪标签数据,并将其加入标记数据扩充样本.最后在有监督交叉熵损失和无监督一致性损失约束下提高模型质量.在公开的Twitter15、Twitter16和 Weibo数据集上的实验结果表明,所提出模型在30%标记样本下准确率达到88.3%、90.1%和95.5%,在少量的标记样本下便可达到优异的成绩.
Abstract
Social media generates a large amount of information,only a small portion of which can be labeled by professionals as true or false rumors.To make full use of the vast amount of unlabeled data and detect false rumors in a timely manner,proposes a model called MSGCN based on multi-level structure and semi supervised learning.This model constructs a multi-level detection module based on graph convolutional neural network to train limited labeled samples to extract multi-level propagation structure features,diffusion structure features,and global structure features.By perturbing the random model and integrating the dynamic output of unlabeled data for consistent prediction,the complementary pseudo label method is used to label the high confidence unlabeled data calculated by the model and add it to the training set to expand the sample.Under supervised cross-entropy loss and unsupervised consistency loss constraints,the model shows excellent performance.The experimental results on public Twitter15,Twitter16,and Weibo datasets show that the proposed model achieves accuracy of 88.3%,90.1%and 95.5%under 30%labeled samples,can achieve excellent performance with a small number of labeled samples.
关键词
谣言检测/半监督/层次结构/伪标签Key words
rumor detection/semi-supervised/multilevel structure/pseudo label引用本文复制引用
基金项目
NSFC-新疆联合基金(U1703261)
出版年
2024