Three-Stage Adversarial Perturbation Generation Active Defense Algorithm for Facial Attribute Editing
With the gradual maturity of deep generation technology,the facial image generated by facial attribute editing technologies appears to mix the spurious with the genuine.Once these facial attribute editing technologies are maliciously used,such as infringing on personal privacy,and maliciously guiding public opinion,etc.,they may trigger some moral,social,and security issues.Regarding the resolution of these malicious facial attribute editing behaviors,although the current passive defense technology based on forensics has achieved considerable performance,it can only provide evidence for tampering behavior and cannot prevent its occurrence,which is difficult to eliminate the losses caused by malicious tampering behavior.Then,the active defense technology has emerged.It prevents face from being tampered with by disrupting the output of facial attribute editing.However,the existing two-stage training active defense framework for facial attribute editing has the issues of insufficient transferability and perturbation robustness.Therefore,this paper proposes a three-stage adversarial perturbation active defense framework for facial attribute editing by optimizing the two-stage training architecture and its loss function and introducing an auxiliary classifier.This paper first modifies the substitute target model in the two-stage training architecture and designs the attribute editing loss for the training of perturbation generator to improve the reconstruction performance and attribute constraint ability of the substitute model,thus reducing the overfitting issue of the substitute model;Secondly,the auxiliary classifier is introduced in the training phase to classify the source attributes of the encoded features extracted by the substitute model and the corresponding auxiliary classifier loss is designed for the training of perturbation generator.Then,the original two-stage alternate training is changed to the three-stage alternate training of substitute target model,auxiliary classifier and perturbation generator,so that it is expected to promote active defense against tampering model by countering auxiliary classifier;Finally,an attack layer is introduced in the training of the perturbation generator to enhance the robustness of the adversarial perturbation against filtering and joint photographic experts group(JPEG)compression.Experimental results on five facial attribute editing models(StarGAN,AttGAN with difference attribute vector input,AttGAN with target attribute vector input,STD-GAN,and style-aware model)show that the proposed framework can better migrate active defense from the white-box substitute model to the black-box attribute editing model than the existing frameworks,improving 16.17%in terms of peak signal-to-noise ratio(PSNR)in the case of black-box,and the generated adversarial perturbation has stronger robustness against JPEG compression and filtering than the baseline,improving 13.91%in terms of PSNR for JPEG compression,and 17.76%for the Gaussian filtering.
facial attribute editingactive defenseadversarial attackauxiliary classifieralternate training