Fake News Detection Based on Pre-Training and Multi-Modal Fusion
Existing multi-modal detection models are typically characterized by a simple splicing of features from each modality and are often ineffective in modeling the correlation between modalities.Furthermore,the migration of these models to domains with sparse labels is challenging.In this paper,a PMFD model,based on pre-training and multi-modal fusion,is proposed.Initially,image raw vectors are extracted from different regions of news incidental images,which are then merged to form image guide vectors.Three distinct multimodal fusion methods are designed:early fusion,middle fusion,and post fusion.During early fusion,the text feature extractor is initialized with image bootstrap vectors,leading to the acquisition of text original vectors,which are subsequently merged into text bootstrap vectors.In the middle fusion stage,the feature representation of the modality is constructed using the modality's original vectors combined with the bootstrap vectors of other modalities.For post fusion,the feature representations of different modalities are fused to construct the feature representation of news.To enhance the model's generalization capability,PMFD is initially pre-trained on label-rich data and then fine-tuned on label-sparse data.Experimental results on public data set show that,this approach demonstrates an improvement of over 10%compared to traditional models,including CNN,LSTM,and BERT,and a 2%-3%enhancement over existing EANN,M_model multi-modal fake news detection models.