A cognitive model of video QA based on multi-angle fusion and joint memory network
In order to solve the problem of insufficient cognition and reasoning ability in existing video question answering models,an observer memory module was introduced,and a machine cognition model based on multi-angle fusion and joint memory network was proposed.The target object was located based on the problem and the corresponding regional features in the video were obtained by this model.At the same time,the motion and appearance features of the video were combined.By adding a gated loop unit with time attention mechanism,the problem features and video features were integrated more effectively for answer generation,which improved the model's cognitive reasoning ability.The experimental results showed that compared to existing video QA models,this model had higher accuracy,which demonstrated better reasoning ability and generalization ability especially for belief reasoning problems with greater difficulty in cognitive reasoning task.