Counterfactual explanations alter the model output by implementing minimal and interpretable modifications to input data,revealing key factors influencing model decisions.Existing counterfactual explanation methods based on diffusion models rely on conditional generation,requiring additional semantic information related to classification.However,ensuring semantic quality of the semantic information is challenging and computational costs are increased.To address these issues,an unconditional counterfactual explanation generation method based on the denoising diffusion implicit model(DDIM)is proposed.By leveraging the consistency exhibited by DDIM during the reverse denoising process,noisy images are treated as latent variables to control the generated outputs,thus making the diffusion model suitable for unconditional counterfactual explanation generation workflows.Then,the advantages of DDIM in filtering high-frequency noise and out-of-distribution perturbations are fully utilized,thereby reconstructing the unconditional counterfactual explanation workflow to generate semantically interpretable modifications.Extensive experiments on different datasets demonstrate that the proposed method achieves superior results across multiple metrics.
关键词
深度学习/可解释性/反事实解释/扩散模型/对抗攻击
Key words
Deep Learning/Interpretability/Counterfactual Explanation/Diffusion Model/Adversarial Attack