Towards robust explanations for deep neural networks

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

外文关键词：

Explanation methodSaliency mapAdversarial attacksManipulationNeural networks

作者：

Dombrowski, Ann-Kathrin、Anders, Christopher J.、Mueller, Klaus-Robert、Kessel, Pan

展开 >

作者单位：

Tech Univ Berlin

出版年：

2022

DOI：

10.1016/j.patcog.2021.108194

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.121

被引量10
参考文献量64