Study on Enhancing the Robustness of RGB-skeleton Action Recognition Based on the Feature Interaction Module
Malicious attackers can easily deceive neural networks by adding human-imperceptible adversarial noise to natural samples,leading to misclassification.To enhance the model's robustness against such adversarial perturbations,previous research has predominantly concentrated on the robustness of single-modal tasks,with insufficient exploration of multimodal scenarios.Therefore,this paper aims to improve the robustness of multimodal RGB-skeleton action recognition and introduces a robust action recognition framework based on a Feature Interaction Module(FIM),which extracts global information from adversarial samples to learn inter-modal joint representations for calibrating multi-modal features.A corresponding loss function tailored to this framework is also developed.Experimental results demonstrate that against CW attack,our method achieves a RI of 25.14%and an average robust accuracy of 48.99%on the NTURGB+D dataset,outperforming the latest SimMin+ExFMem method by 8.55 and 23.79 percentage points,respectively.These findings confirm that our approach surpasses others in enhancing robustness and balancing accuracy rates.
computer visionmultimodalRGB-skeleton action recognitionadversarial training