Visual Story Generation Based on Adaptive Pre-trained Model Matching
The visual story generation task is to generate expressive and coherent sentences describing the visual con tent for a set of image sequences.To coordinate the pre training models of visual and linguistic modal data,this arti-cle proposes a visual story generation algorithm based on adaptive pre-trained model alignment.On the one hand,it comprehensively mines the diverse complementary information of visual,relational,sequential and other aspects of image streams.On the other hand,it applies the adaptive matching loss to align the features between the two modal data,as well as continuous information among the image stream data.Compared with recent state-of-the-art algo-rithms on VIST dataset,the proposed method has achieved competitive results in terms of image and text correla-tion,text diversity,and content logical coherence of the story.