Research on multimodal link prediction method based on Vision Transformer
Aiming at the problems such as insufficient feature representation of existing link prediction methods.A multimodal link prediction method based on Vision Transformer is proposed.Firstly,pHash is employed at the filter gate to filter out irrelevant images.Secondly,picture features are extracted using Vision Transformer model and computed using MRP through oblivion gate.Fi-nally,the picture features are fused with entity relationship features in the fusion gate.The experimental results show that the de-signed model achieves good multimodal link prediction on the WN18-IMG and OpenBG-IMG datasets.