A Method for Private Video Action Recognition Based on Visual Audio Complementary and Semantic Clarity
Video privacy protection is one of the important challenges faced by current society,and blurring videos is an important means to protect people's privacy rights. Due to the natural lack of visual modality information in blurry vid-eos,mainstream video action recognition algorithms cannot achieve satisfactory results. As a multimodal medium,blurry videos not only contain visual modality information but also rich audio modality information. From a human cognitive per-spective,audio is also an important source of information acquisition. In view of this,this article proposes a privacy video action recognition method based on multimodal fusion,which can recognize human action behavior without infringing on user privacy. Specifically,this article uses the audio visual feature fusion module to integrate audio modal feature maps into visual modalities,fully integrating the deep semantic information of audio and video modalities. In addition,the model also introduces clear video frame images as labels to monitor the parameter updates of the action recognition network during the model training phase,providing clear semantic information for the private video action recognition network. The effective-ness of the proposed method was verified through extensive ablation and comparative experiments on multiple sets of pri-vate behavior datasets.