Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph
Addressing the limitations of Chinese Named Entity Recognition(NER)within specific domains,this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs)and images.The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science.The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features,a ResNet152-based approach to extract image features,and a word segmentation tool to obtain noun entities from sentences.These noun entities are then embedded with KG nodes using BERT.The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence.It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence.A Multi-Layer Perceptron(MLP)is employed to map the textual,image,and subgraph features into the same space.A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features.Finally,multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling.Experimental comparisons with relevant baseline models conducted on Twitter2015,Twitter2017,and a self-constructed computer science dataset are presented.The results indicate that the proposed approach achieved precision,recall,and F1 value of 88.56%,87.47%,and 88.01%on the domain dataset compared to the optimal baseline model,its F1 value increased by 1.36 percentage points,demonstrating the effectiveness of incorporating domain KGs for entity recognition.
Named Entity Recognition(NER)multi-modaldomainKnowledge Graph(KG)cross-modal feature fusionattention mechanism