Image Caption Generation Method Based on Priori Lexical Mechanism
In computer vision fields such as object detection and image retrieval,prior knowledge including predefined frames,labels,and category information is utilized to guide model training,enhancing precision and efficiency.The image captio-ning domain typically uses image features or historical semantic information as prior knowledge,yet often overlooks the priori infor-mation of images.To capture prior information of the image in image captioning methods,a new image caption generation technique based on a priori vocabulary mechanisms(PVM)is proposed.This method utilizes Faster R-CNN for extracting image features and incorporates a priori vocabulary generation method that employs multi-instance learning to extract prior information from images.Ad-ditionally,a priori feature extraction module is designed to derive prior features from both the prior vocabulary and the image fea-tures.Lastly,these priori features are fed into an enhanced Transformer to produce descriptive sentences,thereby guiding the model to integrate lexical priori information of the image.The proposed method is experimentally evaluated on the MSCOCO dataset,achie-ving scores of 38.7%on BLEU_4 and 128.5%on CIDEr.These results mark an improvement of 1.7%and 6.7%respectively when compared to baseline models.Such findings indicate that the description text generated by the model is more accurate and comprehensive,which proves the effectiveness of the proposed method.