Weakly supervised video anomaly detection with local-global temporal dependency
Weakly Supervised Video Anomaly Detection(WS-VAD)is of great significance to the field of intelligent security.Currently WS-VAD tasks face the following problems:the existing methods focus more on the discrimination of the video snippets themselves,ignoring the local and global temporal dependency among the snippets;the temporal structure of anomalous events is ignored in loss function setting;a large amount of normal snippet noise exists in the anomalous video,which interferes with the training convergence.Therefore,a WS-VAD method based on Local-Global Temporal Dependency(LGTD)network was proposed.In this method,the LGTD network utilized a Multi-scale Temporal Feature Fusion(MTFF)module to capture the local temporal correlation of snippets within different time spans.At the same time,a Multi-Head Self-Attention(MHSA)module was employed to integrate the information of all snippets within the video and understand the temporal correlation of the whole video sequence.After that,a Squeeze-and-Excitation(SE)module was used to optimize the internal feature weights of the snippets,so as to capture the temporal and spatial features of the snippets more accurately,and significantly improve the detection performance.In addition,the existing loss function was improved by introducing complementary K-maxmin inner bag loss and Top-K outer bag loss to increase the probability of selecting anomaly snippets from the anomalous video for optimization training.Experimental results show that the proposed method has the average Area Under the Curve(AUC)on UCF-Crime and ShanghaiTech datasets reached 83.18%and 95.41%respectively,which are improved by 0.08 and 7.21 percentage points respectively compared with the Collaborative Normality Learning(CNL)method.It can be seen that the proposed method can effectively improve the detection performance.
Video Anomaly Detection(VAD)weakly supervised learningMultiple Instance Learning(MIL)multiscale feature fusionMulti-Head Self-Attention(MHSA)mechanism