CLGLF:Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition Method
To solve the visual semantic understanding bias and multimodal semantic bias in multimodal named entity recognition,the confidence learning guides label fusion (CLGLF) method for multimodal named entity recognition is pro-posed. This method invokes the BLIP-2 pre-trained model to generate image captions,concatenates them with the input texts,and performs joint coding to achieve multimodal feature fusion. The candidate labels and text labels are obtained after decoding the multimodal representations and text representations. Based on using the KL divergence loss function to align the two groups of labels,the confidence score is calculated to evaluate the quality of the multimodal representation,and a confidence threshold is set to help screen out the biased candidate labels,the text labels in the corresponding positions are used to replace the biased candidate labels,to achieve the label fusion,and finally complete the multimodal named entity recognition. In order to verify the proposed method,experiments are carried out on the Twitter-2015 and Twitter-2017 mul-timodal datasets,and the experimental results are compared with 7 mainstream methods,such as MSB and UMT. The exper-imental results show the effectiveness of the CLGLF.
multimodal named entity recognitionimage captionconfidence learningmultimodal semantic biasinformation extraction